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Streptococcus pneumoniae Polynucleotides and Sequences 
FIELD OF THE INVENTION 

5 The present invention relates to the field of molecular biology. In 

particular, it relates to, among other things, nucleotide sequences of Streptococcus 
pneumoniae, contigs, ORFs, fragments, probes, primers and related 
polynucleotides thereof, peptides and polypeptides encoded by the sequences, and 
uses of the polynucleotides and sequences thereof, such as in fermentation, 
10 polypeptide production, assays and pharmaceutical development, among others. 

BACKGROU ND OF THE INVENTION 

Streptococcus pneumoniae has been one of the most extensively studied 

15 microorganisms since its first isolation in 1881. It was the object of many 
investigations that led to important scientific discoveries. In 1928, Griffith 
observed that when heat-killed encapsulated pneumococci and live strains 
constitutively lacking any capsule were concomitantly injected into mice, the 
nonencapsulated could be converted into encapsulated pneumococci with the same 

20 capsular type as the heat-killed strain. Years later, the nature of this "transforming 
principle," or carrier of genetic information, was shown to be DNA. (Avery, O/E., 
et a/., J. Exp. Med., 79:137-157 (1944)). 

In spite of the vast number of publications on S. pneumoniae many 
questions about its virulence are still unanswered, and this pathogen remains a 

25 major causative agent of serious human disease, especially community-acquired 
pneumonia. (Johnston, R.B., et aL, Rev. Infect. Dis. 7i(Suppl. 6):S509-517 
(1991)). In addition, in developing countries, the pneumococcus is responsible for 
the death of a large number of children under the age of 5 years from pneumococcal 
pneumonia. The incidence of pneumococcal disease is highest in infants under 2 

30 years of age and in people over 60 years of age. Pneumococci are the second most 
frequent cause (after Haemophilus influenzae type b) of bacterial meningitis and 
otitis media in children. With the recent introduction of conjugate vaccines for H. 
influenzae type b, pneumococcal meningitis is likely to become increasingly 
prominent. S. pneumoniae is the most important etiologic agent of community- 
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acquired pneumonia in adults and is the second most common cause of bacterial 
meningitis behind Neisseria meningitidis. 

The antibiotic generally prescribed to treat S, pneumoniae is 
benzylpenicillin, although resistance to this and to other antibiotics is found 
5 occasionally. Pneumococcal resistance to penicillin results from mutations in its 
penicillin-binding proteins. In uncomplicated pneumococcal pneumonia caused by 
a sensitive strain, treatment with penicillin is usually successful unless started too 
late. Erythromycin or clindamycin can be used to treat pneumonia in patients 
hypersensitive to penicillin, but resistant strains to these drugs exist. Broad 
10 spectrum antibiotics (e.g., the tetracyclines) may also be effective, although 
tetracycline-resistant strains are not rare. In spite of the availability of antibiotics, 
the mortality of pneumococcal bacteremia in the last four decades has remained 
stable between 25 and 29%. (Gillespie, S.H., et a/., J. Med, Microbiol 28:237- 
248 (1989). 

15 S, pneumoniae is carried in the upper respiratory tract by many healthy 

individuals. It has been suggested that attachment of pneumococci is mediated by a 
disaccharide receptor on fibronectin, present on human pharyngeal epithelial cells. 
(Anderson, B.J., et ai, J. Immunol. 742:2464-2468 (1989). The mechanisms by 
which pneumococci translocate from the nasopharynx to the lung, thereby causing 

20 pneumonia, or migrate to the blood, giving rise to bacteremia or septicemia, are 
poorly understood. (Johnston, R.B., et a/., Rev, Infect Dis. 7J(Suppl. 6):S509- 
517(1991). 

Various proteins have been suggested to be involved in the pathogenicity of 
S. pneumoniae, however, only a few of them have actually been confirmed as 

25 . virulence factors. Pneumococci produce an IgAl protease that might interfere with 
host defense at mucosal surfaces. (Kornfield, S.J., et al, Rev, Inf. Dis, 3:521- 
534 (1981). S. pneumoniae also produces neuraminidase, an enzyme that may 
facilitate attachment to epithelial cells by cleaving sialic acid from the host 
glycolipids and gangliosides. Partially purified neuraminidase was observed to 

30 induce meningitis-like symptoms in mice; however, the reliability of this finding 
has been questioned because the neuraminidase preparations used were probably 
contaminated with cell wall products. Other pneumococcal proteins besides 
neuraminidase are involved in the adhesion of pneumococci to epithelial and 
endothelial cells. These pneumococcal proteins have as yet not been identified. 

35 Recently, Cundel] et- ai, reported that peptide permeases can modulate 
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pneumococcal adherence to epithelial and endothelial cells. It was, however, 
unclear whether these permeases function directly as adhesions or whether they 
enhance adherence by modulating the expression of pneumococcal adhesions. 
(DeVelasco, E.A., ex a/., Micro. Rev. 59:591-603 (1995). A better understanding 
5 of the virulence factors determining its pathogenicity will need to be developed to 
cope with the devastating effects of pneumococcal disease in humans. 

Ironically, despite the prominent role of S. pneumoniae in the discovery of 
DNA, little is known about the molecular genetics of the organism. The S. 
pneumoniae genome consists of one circular, covalently closed, double-stranded 

10 DNA and a collection of so-called variable accessory elements, such as prophages, 
plasrnids, transposons and the like. Most physical characteristics and almost all of 
the genes of 5. pneumoniae are unknown. Among the few that have been 
identified, most have not been physically mapped or characterized in detail. Only a 
few genes of this organism have been sequenced. (See, for instance current 

15 versions of GENBANK and other nucleic acid databases, and references that relate 
to the genome of 5, pneumoniae such as those set out elsewhere herein.) 

It is clear that the etiology of diseases mediated or exacerbated by 5. 
pneumoniae, infection involves the programmed expression of S. pneumoniae 
genes, and that characterizing the genes and their patterns of expression would add 

20 dramatically to our understanding of the organism and its host interactions. 
Knowledge of S. pneumoniae genes and genomic organization would improve our 
understanding of disease etiology and lead to improved and new ways of 
preventing, ameliorating, arresting and reversing diseases. Moreover, 

characterized genes and _ genomic ^fragments „of_ S. ... pneumoniae _ would -provide — - 

25 reagents for, among other things, detecting, characterizing and controlling S. 
pneumoniae infections. There is a need to characterize the genome of 5. 
pneumoniae and for polynucleotides of this organism. 
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SUMMARY OF THE INVENTION 

The present invention is based on the sequencing of fragments of the 
5 Streptococcus pneumoniae genome. The primary nucleotide sequences which were 
generated are provided in SEQ ID NOS: 1-391. 

The present invention provides the nucleotide sequence of several hundred 
contigs of the Streptococcus pneumoniae genome, which are listed in tables below 
and set out in the Sequence Listing submitted herewith, and representative 

10 fragments thereof, in a form which can be readily used, analyzed, and interpreted 
by a skilled artisan. In one embodiment, the present invention is provided as 
contiguous strings of primary sequence information corresponding to the 
nucleotide sequences depicted in SEQ ID NOS: 1-391. 

The present invention further provides nucleotide sequences which are at 

15 least 95% identical to the nucleotide sequences of SEQ ID NOS: 1 -39 1 . 

The nucleotide sequence of SEQ ID NOS: 1-391, a representative fragment 
thereof, or a nucleotide sequence which is at least 95% identical to the nucleotide 
sequence of SEQ ID NOS: 1-391 may be provided in a variety of mediums to 
facilitate its use. In one application of this embodiment, the sequences of the 

20 present invention are recorded on computer readable media. Such media includes, 
but is not limited to: magnetic storage media, such as floppy discs, hard disc 
storage medium, and magnetic tape; optical storage media such as CD-ROM; 
electrical storage media such as RAM and ROM; and hybrids of these categories 
such as magnetic/optical storage media. 

25 The present invention further provides systems, particularly computer- 

based systems which contain the sequence information herein described stored in a 
data storage means. Such systems are designed to identify commercially important 
fragments of the Streptococcus pneumoniae genome. 

Another embodiment of the present invention is directed to fragments of the 

30 Streptococcus pneumoniae genome having particular structural or functional 
attributes. Such fragments of the Streptococcus pneumoniae genome of the present 
invention include, but are not limited to, fragments which encode peptides, 
hereinafter referred to as open reading frames or ORFs, fragments which modulate 
the expression of an operably linked ORF, hereinafter referred to as expression 

35 modulating fragments or EMFs, and fragments which can be used to diagnose the 
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presence of Streptococcus pneumoniae in a sample, hereinafter referred to as 
diagnostic fragments or DFs. 

Each of the ORFs in fragments of the Streptococcus pneumoniae genome 
disclosed in Tables 1-3, and the EMFs found 5' to the ORFs, can be used in 
5 numerous ways as polynucleotide reagents. For instance, the sequences can be 
used as diagnostic probes or amplification primers for detecting or determining the 
presence of a specific microbe in a sample, to selectively control gene expression in 
a host and in the production of polypeptides, such as polypeptides encoded by 
ORFs of the present invention, particular those polypeptides that have a 

1 0 pharmacological activity. 

The present invention further includes recombinant constructs comprising 
one or more fragments of the Streptococcus pneumoniae genome of the present 
invention. The recombinant constructs of the present invention comprise vectors, 
such as a plasmid or viral vector, into which a fragment of the Streptococcus 

15 pneumoniae has been inserted. 

The present invention further provides host cells containing any of the 
isolated fragments of the Streptococcus pneumoniae genome of the present 
invention. The host cells can be a higher eukaryotic host cell; such as a mammalian 
cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a 

20 bacterial cell. 

The present invention is further directed to isolated polypeptides and 
proteins encoded by ORFs of the present invention. A variety of methods, well 
known to those of skill in the art, routinely may be utilized to obtain any of the 
polypeptides and proteins of the present invention. For instance, polypeptides and 

25 proteins of the present invention having relatively short, simple amino acid 
sequences readily can be synthesized using commercially available automated 
peptide synthesizers. Polypeptides and proteins of the present invention also may 
be purified from bacterial cells which naturally produce the protein. Yet another 
alternative is to purify polypeptide and proteins of the present invention from cells 

30 which have been altered to express them. 

The invention further provides methods of obtaining homologs of the 
fragments of the Streptococcus pneumoniae genome of the present invention and 
homologs of the proteins encoded by the ORFs of the present invention. 
Specifically, by using the nucleotide and amino acid sequences disclosed herein as 
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a probe or as primers, and techniques such as PCR cloning and colony/plaque 
hybridization, one skilled in the art can obtain homologs. 

The invention further provides antibodies which selectively bind 
polypeptides and proteins of the present invention. Such antibodies include both 
5 monoclonal and polyclonal antibodies. 

The invention further provides hybridomas which produce the above- 
described antibodies. A hybridoma is an immortalized cell line which is capable of 
secreting a specific monoclonal antibody. 

The present invention further provides methods of identifying test samples 
10 derived from cells which express one of the ORFs of the present invention, or a 
homolog thereof. Such methods comprise incubating a test sample with one or 
more of the antibodies of the present invention, or one or more of the DFs of the 
present invention, under conditions which allow a skilled artisan to determine if the 
sample contains the ORF or product produced therefrom. 
15 In another embodiment of the present invention, kits are provided which 

contain the necessary reagents to carry out the above-described assays. 

Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises: (a) a first container 
comprising one of the antibodies, or one of the DFs of the present invention; and 
20 (b) one or more other containers comprising one or more of the following: wash 
reagents, reagents capable of detecting presence of bound antibodies or hybridized 
DFs. 

Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents capable of binding to 

25 a polypeptide or protein encoded by one of the ORFs of the present invention. 
Specifically, such agents include, as further described below, antibodies, peptides, 
carbohydrates, pharmaceutical agents and the like. Such methods comprise steps 
of: (a) contacting an agent with an isolated protein encoded by one of the ORFs of 
the present invention; and (b) determining whether the agent binds to said protein. 

30 The present genomic sequences of Streptococcus pneumoniae will be of 

great value to all laboratories working with this organism and for a variety of 
commercial purposes. Many fragments of the Streptococcus pneumoniae genome 
will be immediately identified by similarity searches against GenBank or protein 
databases and will be of immediate value to Streptococcus pneumoniae researchers 
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and for immediate commercial value for the production of proteins or to control 
gene expression. 

The methodology and technology for elucidating extensive genomic 
sequences of bacterial and other genomes has and will greatly enhance the ability to 
5 analyze and understand chromosomal organization. In particular, sequenced 
contigs and genomes will provide the models for developing tools for the analysis 
of chromosome structure and function, including the ability to identify genes within 
large segments of genomic DNA, the structure, position, and spacing of regulatory 
elements, the identification of genes with potential industrial applications, and the 
1 0 ability to do comparative genomic and molecular phylogeny. 

DESCRIPTION OF THE FIGURES 

FIGURE 1 is a block diagram of a computer system (102) that can be 
15 used to implement computer-based systems of present invention. 

FIGURE 2 is a schematic diagram depicting the data flow and computer 
programs used to collect, assemble, edit and annotate the contigs of the 
Streptococcus pneumoniae genome of the present invention. Both Macintosh and 

20 Unix platforms are used to handle the AB 373 and 377 sequence data files, largely 
as described in Kerlavage et ai y Proceedings of the Twenty-Sixth Annual Hawaii 
International Conference on System Sciences, 585, IEEE Computer Society Press, 
Washington D.C. (1993). Factura (AB) is a Macintosh program designed for 
automatic vector sequence removal and end-Uimming of sequence files. The 

25 program Loadis runs on a Macintosh platform and parses the feature data extracted 
from the sequence files by Factura to the Unix based Streptococcus pneumoniae 
relational database. Assembly of contigs (and whole genome sequences) is 
accomplished by retrieving a specific set of sequence files and their associated 
features using Extrseq, a Unix utility for retrieving sequences from an SQL 

30 database. The resulting sequence file is processed by seq_filter to trim portions of 
the sequences with more than 2% ambiguous nucleotides. The sequence files were 
assembled using TIGR Assembler, an assembly engine designed at The Institute 
for Genomic Research ( TIGR ) for rapid and accurate assembly of thousands of 
sequence fragments. The collection of contigs generated by the assembly step is 

35 loaded into the database with the lassie program. Identification of open reading 
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frames (ORFs) is accomplished by processing contigs with zorf or GenMark. The 
ORFs are searched against 5. pneumoniae sequences from GenBank and against all 
protein sequences using the BLASTN and BLASTP programs, described in 
Altschul et al. % J. Mol Biol. 215: 403-410 (1990)). Results of the ORF 
5 determination and similarity searching steps were loaded into the database. As 
described below, some results of the determination and the searches are set out in 
Tables 1-3. 

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

10 

The present invention is based on the sequencing of fragments of the 
Streptococcus pneumoniae genome and analysis of the sequences. The primary 
nucleotide sequences generated by sequencing the fragments are provided in SEQ 
ED NOS: 1-391. (As used herein, the "primary sequence" refers to the nucleotide 

15 sequence represented by the IUPAC nomenclature system.) 

In addition to the aforementioned Streptococcus pneumoniae polynucleotide 
and polynucleotide sequences, the present invention provides the nucleotide 
sequences of SEQ ID NOS: 1-391, or representative fragments thereof, in a form 
which can be readily used, analyzed, and interpreted by a skilled artisan. 

20 As used herein, a "representative fragment of the nucleotide sequence 

depicted in SEQ ID NOS: 1-391" refers to any portion of the SEQ ID NOS: 1-391 
which is not presently represented within a publicly available database. Preferred 
representative fragments of the present invention are Streptococcus pneumoniae 
open reading frames ( ORFs ), expression modulating fragment ( EMFs ) and 

25 fragments which can be used to diagnose the presence of Streptococcus 
pneumoniae in sample ( DFs ). A non-limiting identification of preferred 
representative fragments is provided in Tables 1-3. As discussed in detail below, 
the information provided in SEQ ID NOS:l-391 and in Tables 1-3 together with 
routine cloning, synthesis, sequencing and assay methods will enable those skilled 

30 in the art to clone and sequence all "representative fragments" of interest, including 
open reading frames encoding a large variety of Streptococcus pneumoniae 
proteins. 

While the presently disclosed sequences of SEQ ID NOS: 1-391 are highly 
accurate, sequencing techniques are not perfect and, in relatively rare instances, 
35 further investigation of a fragment or sequence of the invention may reveal a 
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nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID 
NOS: 1-391. However, once the present invention is made available (/.<?., once the 
information in SEQ ID NOS: 1-391 and Tables 1-3 has been made available), 
resolving a rare sequencing error in SEQ ID NOS: 1-391 will be well within the 
5 skill of the art. The present disclosure makes available sufficient sequence 
information to allow any of the described contigs or portions thereof to be obtained 
readily by straightforward application of routine techniques. Further sequencing of 
such polynucleotide may proceed in like manner using manual and automated 
sequencing methods which are employed ubiquitous in the art. Nucleotide 

1 0 sequence editing software is publicly available. For example, Applied Biosystem's 
(AB) AutoAssembler can be used as an aid during visual inspection of nucleotide 
sequences. By employing such routine techniques potential errors readily may be 
identified and the correct sequence then may be ascertained by targeting further 
sequencing effort, also of a routine nature, to the region containing the potential 

15 error. 

Even if all of the very rare sequencing errors in SEQ ID NOS: 1-391 were 
corrected, the resulting nucleotide sequences would still be at least 95% identical, 
nearly all would be at least 99% identical, and the great majority would be at least 
99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-391. 

20 As discussed elsewhere herein, polynucleotides of the present invention 

readily may be obtained by routine application of well known and standard 
procedures for cloning and sequencing DNA. Detailed methods for obtaining 
libraries and for sequencing are provided below, for instance. A wide variety of 
Streptococcus pneumoniae strains that can be used to prepare 5. pneumoniae 

25 genomic DNA for cloning and for obtaining polynucleotides of the present 
invention are available to the public from recognized depository institutions, such 
as the American Type Culture Collection ( ATCC ). While the present invention is 
enabled by the sequences and other information herein disclosed, the S. 
pneumoniae strain that provided the DNA of the present Sequence Listing, Strain 

30 7/87 14.8.91, has been deposited in the ATCC, as a convenience to those of skill 
in the art. As a further convenience, a library of S. pneumoniae genomic DNA, 
derived from the same strain, also has been deposited in the ATCC. The S. 
pneumoniae strain was deposited on October 10, 1996, and was given Deposit No. 
55840, and the cDNA library was deposited on October 11, 1996 and was given 

35 Deposit No. 97755. The genomic fragments in the library are 15 to 20 kb 
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fragments generated by partial Sau3Al digestion and they are inserted into the 
BamHI site in the well-known lambda-derived vector lambda DASH II (Stratagene, 
La Jolla, CA). The provision of the deposits is not a waiver of any rights of the 
inventors or their assignees in the present subject matter. 
5 The nucleotide sequences of the genomes from different strains of 

Streptococcus pneumoniae differ somewhat. However, the nucleotide sequences 
of the genomes of all Streptococcus pneumoniae strains will be at least 95% 
identical, in corresponding part, to the nucleotide sequences provided in SEQ ID 
NOS: 1-391. Nearly all will be at least 99% identical and the great majority will be 

10 99.9% identical. 

Thus, the present invention further provides nucleotide sequences which 
are at least 95%, preferably 99% and most preferably 99.9% identical to the 
nucleotide sequences of SEQ ID NOS: 1-391, in a form which can be readily used, 
analyzed and interpreted by the skilled artisan. 

i5 Methods for determining whether a nucleotide sequence is at least 95%, at 

least 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID 
NOS: 1-391 are routine and readily available to the skilled artisan. For example, the 
well known fasta algorithm described in Pearson and Lipman, Proc. Natl Acad. 
ScL USA 85: 2444 (1988) can be used to generate the percent identity of nucleotide 

20 sequences. The BLASTN program also can be used to generate an identity score 
of polynucleotides compared to one another. 

COMPUTER RELATED EMBODIMENTS 

The nucleotide sequences provided in SEQ ID NOS: 1-391, a representative 
25 fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% 
and most preferably at least 99.9% identical to a polynucleotide sequence of SEQ 
ID NOS: 1-391 may be "provided" in a variety of mediums to facilitate use thereof. 
As used herein, provided refers to a manufacture, other than an isolated nucleic 
acid molecule, which contains a nucleotide sequence of the present invention; i.e., 
30 a nucleotide sequence provided in SEQ ID NOS: 1-391, a representative fragment 
thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most 
preferably at least 99.9% identical to a polynucleotide of SEQ ID NOS: 1-391. 
Such a manufacture provides a large portion of the Streptococcus pneumoniae 
genome and parts thereof (e.g., a Streptococcus pneumoniae open reading frame 
35 (ORF)) in a form which allows a skilled artisan to examine the manufacture using 



WO 98/18931 PCT/US97/19588 

II 



means not directly applicable to examining the Streptococcus pneumoniae genome 
or a subset thereof as it exists in nature or in purified form. 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 
5 readable media" refers to any medium which can be read and accessed directly by a 
computer. Such media include, but are not limited to: magnetic storage media, 
such as floppy discs, hard disc storage medium, and magnetic tape; optical storage 
media such as CD- ROM; electrical storage media such as RAM and ROM; and 
hybrids of these categories, such as magnetic/optical storage media. A skilled 

10 artisan can readily appreciate how any of the presently known computer readable 
mediums can be used to create a manufacture comprising computer readable 
medium having recorded thereon a nucleotide sequence of the present invention. 
Likewise, it will be clear to those of skill how additional computer readable media 
that may be developed also can- be used to create analogous manufactures haying 

1 5 recorded thereon a nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the presently 
know methods for recording information on computer readable medium to generate 
manufactures comprising the. nucleotide sequence information of the present 

20 invention. A variety of data storage structures are available to a skilled artisan 
for creating a computer readable medium having recorded thereon a nucleotide 
sequence of the present invention. The choice of the data storage structure will 
generally be based on the means chosen to access the stored information. In 
addition, a variety of data processor programs and formats can be used to store the 

25 nucleotide sequence information of the present invention on computer readable 
medium. The sequence information can be represented in a word processing text 
file, formatted in commercially- available software such as WordPerfect and 
Microsoft Word, or represented in the form of an ASCII file, stored in a database 
application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily 

30 adapt any number of data-processor structuring formats (e .g. t text file or database) 
in order to obtain computer readable medium having recorded thereon the 
nucleotide sequence information of the present invention. 

Computer software is publicly available which allows a skilled artisan to 
access sequence information provided in a computer readable medium. Thus, by 

35 providing in computer readable form the nucleotide sequences of SEQ ID NOS:l- 
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391, a representative fragment thereof, or a nucleotide sequence at least 95%, 
preferably at least 99% and most preferably at least 99.9% identical to a sequence 
of SEQ ID NOS: 1-391 the present invention enables the skilled artisan routinely to 
access the provided sequence information for a wide variety of purposes. 
5 The examples which follow demonstrate how software which implements 

the BLAST (Altschul et a/., J. Mol Biol. 275:403-410 (1990)) and BLAZE 
(Brutlag etai, Comp. Chem. 77:203-207 (1993)) search algorithms on a Sybase 
system was used to identify open reading frames (ORFs) within the Streptococcus 
pneumoniae genome which contain homology to ORFs or proteins from both 

1 0 Streptococcus pneumoniae and from other organisms. Among the ORFs discussed 
herein are protein encoding fragments of the Streptococcus pneumoniae genome 
useful in producing commercially important proteins, such as enzymes used in 
fermentation reactions and in the production of commercially useful metabolites. 

The present invention further provides systems, particularly computer- 

15 based systems, which contain the sequence information described herein. Such 
systems are designed to identify, among other things, commercially important 
fragments of the Streptococcus pneumoniae genome. 

As used herein, "a computer-based system" refers to the hardware means, 
software means, and data storage means used to analyze the nucleotide sequence 

20 information of the present invention. The minimum hardware means of the 
computer-based systems of the present invention comprises a central processing 
unit (CPU), input means, output means, and data storage means. A skilled artisan 
can readily appreciate that any one of the currently available computer-based 
systems are suitable for use in the present invention. 

25 As stated above, the computer-based systems of the present invention 

comprise a data storage means having stored therein a nucleotide sequence of the 
present invention and the necessary hardware means and software means for 
supporting and implementing a search means. 

As used herein, "data storage means" refers to memory which can store 

30 nucleotide sequence information of the present invention, or a memory access 
means which can access manufactures having recorded thereon the nucleotide 
sequence information of the present invention. 

As used herein, "search means" refers to one or more programs which are 
implemented on the computer-based system to compare a target sequence or target 

35 structural motif with the sequence information stored within the data storage 
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means. Search means are used to identify fragments or regions of the present 
genomic sequences which match a particular target sequence or target motif. A 
variety of known algorithms are disclosed publicly and a variety of commercially 
available software for conducting search means are and can be used in the 
5 computer-based systems of the present invention. Examples of such software 
includes, but is not limited to, MacPattern (EMBL), BLASTN and BLASTX 
(NCBIA). A skilled artisan can readily recognize that any one of the available 
algorithms or implementing software packages for conducting homology searches 
can be adapted for use in the present computer-based systems. 

10 As used herein, a "target sequence" can be any DNA or amino acid 

sequence of six or more nucleotides or two or more amino acids. A skilled artisan 
can readily recognize that the longer a target sequence is, the less likely a target 
sequence will be present as a random occurrence in the database. The most 
preferred sequence length of a target sequence is from about 10 to 100 amino acids 

15 or from about 30 to 300 nucleotide residues. However, it is well recognized that 
searches for commercially important fragments, such as sequence fragments 
involved in gene expression and protein processing, may be of shorter length. 

As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) 

20 are chosen based on a three-dimensional configuration which is formed upon the 
folding of the target motif. There are a variety of target motifs known in the art. 
Protein target motifs include, but are not limited to, enzymic active sites and signal 
sequences. Nucleic acid target motifs include, but are not limited to, promoter 
sequences, hairpin structures and inducible expression elements (protein binding 

25 sequences). 

A variety of structural formats for the input and output means can be used 
to input and output the information in the computer-based systems of the present 
invention. A preferred format for an output means ranks fragments of the 
Streptococcus pneumoniae genomic sequences possessing varying degrees of 

30 homology to the target sequence or target motif. Such presentation provides a 
skilled artisan with a ranking of sequences which contain various amounts of the 
target sequence or target motif and identifies the degree of homology contained in 
the identified fragment. 

A variety of comparing means can be used to compare a target sequence or 

35 target motif with the data storage means to identify sequence fragments of the 
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Streptococcus pneumoniae genome. In the present examples, implementing 
software which implement the BLAST and BLAZE algorithms, described in 
Altschul et aU 1 Mol BioL 215: 403-410 (1990), is used to identify open reading 
frames within the Streptococcus pneumoniae genome. A skilled artisan can readily 
5 recognize that any one of the publicly available homology search programs can be 
used as the search means for the computer-based systems of the present invention. 
Of course, suitable proprietary systems that may be known to those of skill also 
may be employed in this regard. 

Figure 1 provides a block diagram of a computer system illustrative of 

10 embodiments of this aspect of present invention. The computer system 102 
includes a processor 106 connected to a bus 104. Also connected to the bus 104 
are a main memory 108 (preferably implemented as random access memory, RAM) 
and a variety of secondary storage devices 1 10, such as a hard drive 1 12 and a 
removable medium storage device 1 14. The removable medium storage device 1 14 

15 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape 
drive, etc. A removable storage medium 116 (such as a floppy disk, a compact 
disk, a magnetic tape, etc) containing control logic and/or data recorded therein 
may be inserted into the removable medium storage device 114. The computer 
system 102 includes appropriate software for reading the control logic and/or the 

20 data from the removable medium storage device 1 14, once it is inserted into the 
removable medium storage device 1 14. 

A nucleotide sequence of the present invention may be stored in a well 
known manner in the main memory 108, any of the secondary storage devices 1 10, 
and/or a removable storage medium 1 16. During execution, software for accessing 

25 and processing the genomic sequence (such as search tools, comparing tools, etc.) 
reside in main memory 108, in accordance with the requirements and operating 
parameters of the operating system, the hardware system and the software program 
or programs. 
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BIOCHEMICAL EMBODIMENTS 

Other embodiments of the present invention are directed to isolated 
fragments of the Streptococcus pneumoniae genome. The fragments of the 
5 Streptococcus pneumoniae genome of the present invention include, but are not 
limited to fragments which encode peptides and polypeptides, hereinafter open 
reading frames (ORFs), fragments which modulate the expression of an operably 
linked ORF, hereinafter expression modulating fragments (EMFs) and fragments 
which can be used to diagnose the presence of Streptococcus pneumoniae in a 

10 sample, hereinafter diagnostic fragments (DFs). 

As used herein, an "isolated nucleic acid molecule" or an "isolated fragment 
of the Streptococcus pneumoniae genome" refers to a nucleic acid molecule 
possessing a specific nucleotide sequence which has been subjected to purification 
means to reduce, from the composition, the number of compounds which are 

15 normally associated with the composition. Particularly, the term refers to the 
nucleic acid molecules having the sequences set out in SEQ ID NOS: 1-391, to 
representative fragments thereof as described above, to polynucleotides at least 
95%, preferably at least 99% and especially preferably at least 99.9% identical in 
sequence thereto, also as set out above. 

20 A variety of purification means can be used to generate the isolated 

fragments of the present invention. These include, but are not limited to methods 
which separate constituents of a solution based on charge, solubility, or size. 

In one embodiment. Streptococcus pneumoniae DNA can be enzymatically 
sheared to produce fragments of 15-20 kb in length. These fragments can then be 

25 used to generate a Streptococcus pneumoniae library by inserting them into lambda 
clones as described in the Examples below. Primers flanking, for example, an 
ORF, such as those enumerated in Tables 1-3 can then be generated using 
nucleotide sequence information provided in SEQ ID NOS: 1-391. Well known 
and routine techniques of PCR cloning then can be used to isolate the ORF from 

30 the lambda DNA library or Streptococcus pneumoniae genomic DNA. Thus, given 
the availability of SEQ ID NOS:l-391, the information in Tables 1, 2 and 3, and 
the information that may be obtained readily by analysis of the sequences of SEQ 
ID NOS: 1-391 using methods set out above, those of skill will be enabled by the 
present disclosure to isolate any ORF-containing or other nucleic acid fragment of 

35 the present invention. . 
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The isolated nucleic acid molecules of the present invention include, but are 
not limited to single stranded and double stranded DNA, and single stranded RNA. 

As used herein, an "open reading frame," ORF, means a series of triplets 
coding for amino acids without any termination codons and is a sequence 
5 translatable into protein. 

Tables 1, 2, and 3 list ORFs in the Streptococcus pneumoniae genomic 
contigs of the present invention that were identified as putative coding regions by 
the GeneMark software using organism-specific second-order Markov probability 
transition matrices. It will be appreciated that other criteria can be used, in 
10 accordance with well known analytical methods, such as those discussed herein, to 
generate more inclusive, more restrictive, or more selective lists. 

Table 1 sets out ORFs in the Streptococcus pneumoniae contigs of the 
present invention that over a continuous region of at least 50 bases are 95% or 
more identical (by BLAST analysis) to a nucleotide sequence available through 
15 GenBank in October, 1997. 

Table 2 sets out ORFs in the Streptococcus pneumoniae contigs of the 
present invention that are not in Table 1 and match, with a BLASTP probability 
score of 0.01 or less, a polypeptide sequence available through GenBank in 
October, 1997. 

20 Table 3 sets out ORFs in the Streptococcus pneumoniae contigs of the 

present invention that do not match significantly, by BLASTP analysis, a 
polypeptide sequence available through GenBank in October, 1997. 

In each table, the first and second columns identify the ORF by, 
respectively, contig number and ORF number within the contig; the third column 

25 indicates the first nucleotide of the ORF (actually the first nucleotide of the stop 
codon immediately proceeding the ORF), counting from the 5' end of the contig 
strand; and the fourth column, "stop (nt)" indicates the last nucleotide of the stop 
codon defining the 3'end of the ORF. 

In Tables 1 and 2, column five, lists the Reference for the closest 

30 matching sequence available through GenBank. These reference numbers are the 
databases entry numbers commonly used by those of skill in the art, who will be 
familiar with their denominators. Descriptions of the nomenclature are available 
from the National Center for Biotechnology Information. Column six in Tables 1 
and 2 provides the gene name of the matching sequence; column seven provides 

35 the BLAST identity score and column eight the BLAST similarity score from the 
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comparison of the ORF and the homologous gene; and column nine indicates the 
length in nucleotides of the highest scoring segment pair identified by the BLAST 
identity analysis. 

Each ORF described in the tables is defined by "start (nt)" (5*) and "stop 
5 (nt)" (3') nucleotide position numbers. These position numbers refer to the 
boundaries of each ORF and provide orientation with respect to whether the 
forward or reverse strand is the coding strand and which reading frame the coding 
sequence is contained. The "start" position is the first nucleotide of the triplet 
encoding a stop codon just 5' to the ORF and the "stop" position is the last 

10 nucleotide of the triplet encoding the next in-frame stop codon (i.e., the stop codon 
at the 3' end of the ORF). Those of ordinary skill in the art appreciate that 
preferred fragments within each ORF described in the table include fragments of 
each ORF which include the entire sequence from the delineated "start" and "stop" 
positions excepting the first and last three nucleotides since these encode stop 

15 codons. Thus, polynucleotides set out as ORFs in the tables but lacking the three 
(3) 5' nucleotides and the three (3) 3' nucleotides are encompassed by the present 
invention. Those of skill also appreciate that particularly preferred are fragments 
within each ORF that are polynucleotide fragments comprising polypeptide coding 
sequence. As defined herein, "coding sequence" includes the fragment within an 

20 ORF beginning at the first in-frame ATG (triplet encoding methionine) and ending 
with the last nucleotide prior to the triplet encoding the 3' stop codon. Preferred 
are fragments comprising the entire coding sequence and fragments comprising the 
entire coding sequence, excepting the coding sequence for the N-terminal 
methionine. Those of skill appreciate that the N-terminal methionine is often 

25 removed during post-translational processing and that polynucleotides lacking the 
ATG can be used to facilitate production of N-termainal fusion proteins which may 
be benefical in the production or use of genetically engineered proteins. Of course, 
due to the degeneracy of the genetic code'many polynucleotides can encode a given 
polypeptide. Thus, the invention further includes polynucleotides comprising a 

30 nucleotide sequence encoding a polypeptide sequence itself encoded by the coding 
sequence within an ORF described in Tables 1-3 herein. Further, polynucleotides 
at least 95%, preferably at least 99% and especially preferably at least 99.9% 
identical in sequence to the foregoing polynucleotides, are contemplated by the 
present invention. 
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Polypeptides encoded by polynucleotides described above and elsewhere 
herein are also provided by the present invention as are polypeptide comprising a 
an amino acid sequence at least about 95%, preferably at least 97% and even more 
preferably 99% identical to the amino acid sequence of a polypeptide encoded by an 
5 ORF shown in Tables 1-3. These polypeptides may or may not comprise an N- 
terminal methionine. 

The concepts of percent identity and percent similarity of two polypeptide 
sequences is well understood in the art. For example, two polypeptides 10 amino 
acids in length which differ at three amino acid positions (e.g., at positions 1, 3 

10 and 5) are said to have a percent identity of 70%. However, the same two 
polypeptides would be deemed to have a percent similarity of 80% if, for example 
at position 5, the amino acids moieties, although not identical, were "similar" (i.e., 
possessed similar biochemical characteristics). Many programs for analysis of 
nucleotide or amino acid sequence similarity, such as fasta and BLAST specifically 

!5 list percent identity of a matching region as an output parameter. Thus, for 
instance, Tables 1 and 2 herein enumerate the percent identity of the highest 
scoring segment pair in each ORF and its listed relative. Further details 
concerning the algorithms and criteria used for homology searches are provided 
below and are described in the pertinent literature highlighted by the citations 

20 provided below. 

It will be appreciated that other criteria can be used to generate more 
inclusive and more exclusive listings of the types set out in the tables. As those of 
skill will appreciate, narrow and broad searches both are useful. Thus, a skilled 
artisan can readily identify ORFs in contigs of the Streptococcus pneumoniae 

25 genome other than those listed in Tables 1-3, such as ORFs which are overlapping 
or encoded by the opposite strand of an identified ORF in addition to those 
ascertainable using the computer-based systems of the present invention. 

As used herein, an "expression modulating fragment," EMF, means a 
series of nucleotide molecules which modulates the expression of an operably 

30 linked ORF or EMF. 
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As used herein, a sequence is said to "modulate the expression of an 
operably linked sequence" when the expression of the sequence is altered by the 
presence of the EMF. EMFs include, but are not limited to, promoters, and 
promoter modulating sequences (inducible elements). One class of EMFs are 
5 fragments which induce the expression or an operably linked ORF in response to a 
specific regulatory factor or physiological event. 

EMF sequences can be identified within the contigs of the Streptococcus 
pneumoniae genome by their proximity to the ORFs provided in Tables 1-3. An 
intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 

!0 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will modulate 
the expression of an operably linked ORF in a fashion similar to that found with the 
naturally linked ORF sequence. As used herein, an "intergenic segment" refers to 
fragments of the Streptococcus pneumoniae genome which are between two 
ORF(s) herein described. EMFs also can be identified using known EMFs as a 

15 target sequence or target motif in the computer-based systems of the present 
invention. Further, the two methods can be combined and used together. 

The presence and activity of an EMF can be confirmed using an EMF trap 
vector. An EMF trap vector contains a cloning site linked to a marker sequence. A 
marker sequence encodes an identifiable phenotype, such as antibiotic resistance or 

20 a complementing nutrition auxotrophic factor, which can be identified or assayed 
when the EMF trap vector is placed within an appropriate host under appropriate 
conditions. As described above, a EMF will modulate the expression of an 
operably linked marker sequence. A more detailed discussion of various marker 
sequences is provided below. A sequence which is suspected as being an EMF is 

25 cloned in all three reading frames in one or more restriction sites upstream from the 
marker sequence in the EMF trap vector. The vector is then transformed into an 
appropriate host using known procedures and the phenotype of the transformed 
host in examined under appropriate conditions. As described above, an EMF will 
modulate the expression of an operably linked marker sequence. 

30 As used herein, a "diagnostic fragment," DF, means a series of nucleotide 

molecules which selectively hybridize to Streptococcus pneumoniae sequences. 
DFs can be readily identified by identifying unique sequences within contigs of the 
Streptococcus pneumoniae genome, such as by using well-known computer 
analysis software, and by generating and testing probes or amplification primers 
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consisting of the DF sequence in an appropriate diagnostic format which 
determines amplification or hybridization selectivity. 

The sequences falling within the scope of the present invention are not 
limited to the specific sequences herein described, but also include allelic and 
5 species variations thereof. Allelic and species variations can be routinely 
determined by comparing the sequences provided in SEQ ID NOS: 1-391, a 
representative fragment thereof, or a nucleotide sequence at least 95%, preferrably 
at least 99% and most at least preferably 99.9% identical to SEQ ID NOS: 1-391, 
with a sequence from another isolate of the same species. Furthermore, to 

10 accommodate codon variability, the invention includes nucleic acid molecules 
coding for the same amino acid sequences as do the specific ORFs disclosed 
herein. In other words, in the coding region of an ORF, substitution of one codon 
for another which encodes the same amino acid is expressly contemplated. Any 
specific sequence disclosed herein can be readily screened for errors by 

15 resequencing a particular fragment, such as an ORF, in both directions (i.e., 
sequence both strands). Alternatively, error screening can be performed by 
sequencing corresponding polynucleotides of Streptococcus pneumoniae origin 
isolated by using part or all of the fragments in question as a probe or primer. 

Preferred DFs of the present invention comprise at least about 17, 

20 preferrably at least about 20, and more preferrably at least about 50 contiguous 
nucleotides within an ORF set out in Tables 1-3. Most highly preferred DFs 
specifically hybridize to a polynucleotide containing the sequence of the ORF from 
which they are derived. Specific hybridization occurs even under stringent 
conditions defined elsewhere herein. 

25 Each of the ORFs of the Streptococcus pneumoniae genome disclosed in 

Tables 1, 2 and 3, and the EMFs found 5' to the ORFs, can be used as 
polynucleotide reagents in numerous ways. For example, the sequences can be 
used as diagnostic probes or diagnostic amplification primers to detect the presence 
of a specific microbe in a sample, particularly Streptococcus pneumoniae. 

30 Especially preferred in this regard are ORFs such as those of Table 3, which do not 
match previously characterized sequences from other organisms and thus are most 
likely to be highly selective for Streptococcus pneumoniae. Also particularly 
preferred are ORFs that can be used to distinguish between strains of Streptococcus 
pneumoniae, particularly those that distinguish medically important strain, such as 

35 drug-resistant strains. 
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In addition, the fragments of the present invention, as broadly described, 
can be used to control gene expression through triple helix formation or antisense 
DNA or RNA, both of which methods are based on the binding of a polynucleotide 
sequence to DNA or RNA. Triple helix-formation optimally results in a shut-off of 
5 RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Information from the 
sequences of the present invention can be used to design antisense and triple helix- 
forming oligonucleotides. Polynucleotides suitable for use in these methods are 
usually 20 to 40 bases in length and are designed to be complementary to a region 

10 of the gene involved in transcription, for triple-helix formation, or to the mRNA 
itself, for antisense inhibition. Both techniques have been demonstrated to, be 
effective in model systems, and the requisite techniques are well known and 
involve routine procedures. Triple helix techniques are discussed in, for example, 
Lee et a/., Nucl. Acids Res. 6:3073 (1979); Cooney et al> Science 241:456 

15 (1988); and Dervan et aL Science 257:1360 (1991). Antisense techniques in 
general are discussed in, for instance, Okano, J. Neurochem. 56:560 (1991) and 
Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
Boca Raton, FL( 1988)). 

The present invention further provides recombinant constructs comprising 

20 one or more fragments of the Streptococcus pneumoniae genomic fragments and 
contigs of the present invention. Certain preferred recombinant constructs of the 
present invention comprise a vector, such as a plasmid or viral vector, into which a 
fragment of the Streptococcus pneumoniae genome has been inserted, in a forward 
or reverse orientation. In the case of a- vector comprising one of the ORFs of the 

25 present invention, the vector may further comprise regulatory sequences, including 
for example, a promoter, operably linked to the ORF. For vectors comprising the 
EMFs of the present invention, the vector may further comprise a marker sequence 
or heterologous ORF operably linked to the EMF. 

Large numbers of suitable vectors and promoters are known to those of 

30 skill in the art and are commercially available for generating the recombinant 
constructs of the present invention. The following vectors are provided by way of 
example. Useful bacterial vectors include phagescript, PsiX174, pBluescript SK, 
pBS KS, pNH8a, pNH16a, pNH18a, pNH46a (available from Stratagene); 
pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (available from Pharmacia). 

35 Useful eukaiyotic vectors include pWLneo, pSV2cat, pOG44, pXTl, pSG 
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(available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from 
Pharmacia). 

Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. 
5 Two appropriate vectors arc pKK232-8 and pCM7. Particular named bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metallothionein- I. Selection of the 
appropriate vector and promoter is well within the level of ordinary skill in the art. 

10 The present invention further provides host cells containing any one of the 

isolated fragments of the Streptococcus pneumoniae genomic fragments and 
contigs of the present invention, wherein the fragment has been introduced into the 
host cell using known methods. The host cell can be a higher eukaryotic host 
cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or 

15 a procaryotic cell, such as a bacterial cell. 

A polynucleotide of the present invention, such as a recombinant construct 
comprising an ORF of the present invention, may be introduced into the host by a 
variety of well established techniques that are standard in the art, such as calcium 
phosphate transfection, DEAE, dextran mediated transfection and electroporation, 

20 which are described in, for instance, Davis, L. et a/., BASIC METHODS IN 
MOLECULAR BIOLOGY (1986). 

A host cell containing one of the fragments of the Streptococcus 
pneumoniae genomic fragments and contigs of the present invention, can be used 
in conventional manners to produce the gene product encoded by the isolated 

25 fragment (in the case of an ORF) or can be used to produce a heterologous protein 
under the control of the EMF, The present invention further provides 

isolated polypeptides encoded by the nucleic acid fragments of the present 
invention or by degenerate variants of the nucleic acid fragments of the present 
invention. By "degenerate variant" is intended nucleotide fragments which differ 

30 from a nucleic acid fragment of the present invention {e.g., an ORF) by nucleotide 
sequence but, due to the degeneracy of the Genetic Code, encode an identical 
polypeptide sequence. 

Preferred nucleic acid fragments of the present invention are the ORFs and 
subfragments thereof depicted in Tables 2 and 3 which encode proteins. 
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A variety of methodologies known in the art can be utilized to obtain any 
one of the isolated polypeptides or proteins of the present invention. At the 
simplest level, the amino acid sequence can be synthesized using commercially 
available peptide synthesizers. This is particularly useful in producing small 
5 peptides and fragments of larger polypeptides. Such short fragments as may be 
obtained most readily by synthesis are useful, for example, in generating antibodies 
against the native polypeptide, as discussed further below. 

In an alternative method, the polypeptide or protein is purified from 
bacterial cells which naturally produce the polypeptide or protein. One skilled in 

10 the art can readily employ well-known methods for isolating polypeptides and 
proteins to isolate and purify polypeptides or proteins of the present invention 
produced naturally by a bacterial strain, or by other methods. Methods for 
isolation and purification that can be employed in this regard include, but are not 
limited to, immunochromatography, HPLC, size-exclusion chromatography, ion- 

15 exchange chromatography, and immuno-affinity chromatography. 

The polypeptides and proteins of the present invention also can be purified 
from cells which have been altered to express the desired polypeptide or protein. 
As used herein, a cell is said to be altered to express a desired polypeptide or 
protein when the cell, through genetic manipulation, is made to produce a 

20 polypeptide or protein which it normally does not produce or which the cell 
normally produces at a lower level. Those skilled in the art can readily adapt 
procedures for introducing and expressing either recombinant or synthetic 
sequences into eukaryotic or prokaryotic cells in order to generate a cell which 
produces one of the polypeptides or proteins of the present invention. 

25 Any host/vector system can be used to express one or more of the ORFs of 

the present invention. These include, but are not limited to, eukaryotic hosts such 
as HeLa cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host 
such as E. coli and B. subtilis. The most preferred cells are those which do not 
normally express the particular polypeptide or protein or which expresses the 

30 polypeptide or protein at low natural level. 
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"Recombinant," as used herein, means that a polypeptide or protein is 
derived from recombinant (e.g., microbial or mammalian) expression systems. 
"Microbial" refers to recombinant polypeptides or proteins made in bacterial or 
fungal (e.g., yeast) expression systems. As a product, "recombinant 
5 microbiaTdefines a polypeptide or protein essentially free of native endogenous 
substances and unaccompanied by associated native glycosylation. Polypeptides or 
proteins expressed in most bacterial cultures, e.g., E. coli, will be free of 
glycosylation modifications; polypeptides or proteins expressed in yeast will have a 
glycosylation pattern different from that expressed in mammalian cells. 

10 "Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. 

Generally, DNA segments encoding the polypeptides and proteins provided by this 
invention are assembled from fragments of the Streptococcus pneumoniae genome 
and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a 
synthetic gene which is capable of being expressed in a recombinant transcriptional 

1 5 unit comprising regulatory elements derived from a microbial or viral operon. 

Recombinant expression vehicle or vector" refers to a plasmid or phage or 
virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The 
expression vehicle can comprise a transcriptional unit comprising an assembly of 
(1) a genetic regulatory elements necessary for gene expression in the host, 

20 including elements required to initiate and maintain transcription at a level sufficient 
for suitable expression of the desired polypeptide, including, for example^ 
promoters and, where necessary, an enhancer and a polyadenylation signal; (2) a 
structural or coding sequence which is transcribed into mRNA and translated into 
protein, and (3) appropriate signals to initiate translation at the beginning of the 

25 , desired coding region and terminate translation at its end. Structural units intended 
for use in yeast or eukaryotic expression systems preferably include a leader 
sequence enabling extracellular secretion of translated protein by a host cell. 
Alternatively, where recombinant protein is expressed without a leader or transport 
sequence, it may include an N-terminal methionine residue. This residue may or 

30 may not be subsequently cleaved from the expressed recombinant protein to 
provide a final product. 

"Recombinant expression system" means host cells which have stably 
integrated a recombinant transcriptional unit into chromosomal DNA or carry the 
recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic 

35 or eukaryotic. Recombinant expression systems as defined herein will express 
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heterologous polypeptides or proteins upon induction of the regulatory elements 
linked to the DNA segment or synthetic gene to be expressed. 

Mature proteins can be expressed in mammalian cells, yeast, bacteria, or 
other cells under the control of appropriate promoters. Cell-free translation 
5 systems can also be employed to produce such proteins using RNAs derived from 
the DNA constructs of the present invention. Appropriate cloning and expression 
vectors for use with prokaryotic and eukaryotic hosts are described in Sambrook et . 
a/., Molecular Cloning: A Laboratory Manual 2 nd Edition, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which 

10 is hereby incorporated by reference in its entirety. 

Generally, recombinant expression vectors will include origins of 
replication and selectable markers permitting transformation of the host cell, e.g., 
the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a 
promoter derived from a highly expressed gene to direct transcription of a 

15 downstream structural sequence. Such promoters can be derived from operons 
encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alpha- 
factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequences, and preferably, a leader sequence capable of directing 

20 secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an N- 
terminal identification peptide imparting desired characteristics, e.g., stabilization 
or simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a 

25 structural DNA sequence encoding a desired protein together with suitable 
translation initiation and termination signals in operable reading phase with a 
functional promoter. The vector will comprise one or more phenotypic selectable 
markers and, an origin of replicatiqn to ensure maintenance of the vector and, when 
desirable, provide amplification within the host. 

30 Suitable prokaryotic hosts for transformation include strains of E. coli, B. 

subtilis, Salmonella typhimurium and various species within the genera 
Pseudomonas and Streptomyces. Others may, also be employed as a matter of 
choice. 

As a representative but non-limiting example, useful expression vectors for 
35 bacterial use can comprise a selectable marker and bacterial origin of replication 
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derived from commercially available plasmids comprising genetic elements of the 
well known cloning vector pBR322 (ATCC 37017). Such commercial vectors 
include, for example, pKK223-3 (available form Pharmacia Fine Chemicals, 
Uppsala, Sweden) and GEM 1 (available from Promega Biotec, Madison, WI, 
5 USA). These pBR322 "backbone" sections are combined with an appropriate 
promoter and the structural sequence to be expressed. 

Following transformation of a suitable host strain and growth of the host 
strain to an appropriate cell density, the selected promoter, where it is inducible, is 
derepressed or induced by appropriate means (e.g., temperature shift or chemical 

10 induction) and cells are cultured for an additional period to provide for expression 
of the induced gene product. Thereafter cells are typically harvested, generally by 
centrifugation, disrupted to release expressed protein, generally by physical or 
chemical means, and the resulting crude extract is retained for further purification. 
Various mammalian cell culture systems can also be employed to express 

15 recombinant protein. Examples of mammalian expression systems include the 
COS-7 lines of monkey kidney Fibroblasts, described in Gluzman, Cell 23:175 
( 1 98 1 ), and other cell lines capable of expressing a compatible vector, for example, 
the C127, 3T3, CHO, HeLa and BHK cell lines. 

Mammalian expression vectors will comprise an origin of replication, a 

20 suitable promoter and enhancer, and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, transcriptional termination 
sequences, and 5' flanking nontranscribed sequences. DNA sequences derived 
from the S V40 viral genome, for example, S V40 origin, early promoter, enhancer, 
splice, and polyadenylation sites may be used to provide the required 

25 nontranscribed genetic elements. 

Recombinant polypeptides and proteins produced in bacterial culture is 
usually isolated by initial extraction from cell pellets, followed by one or more 
salting-out, aqueous ion exchange or size exclusion chromatography steps. 
Microbial cells employed in expression of proteins can be disrupted by any 

30 convenient method, including freeze-thaw cycling, sonication, mechanical 
disruption, or use of cell lysing agents. Protein refolding steps can be used, as 
necessary, in completing configuration of the mature protein. Finally, high 
performance liquid chromatography (HPLC) can be employed for final purification 
steps. 
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The present invention further includes isolated polypeptides, proteins and 
nucleic acid molecules which are substantially equivalent to those herein described. 
As used herein, substantially equivalent can refer both to nucleic acid and amino 
acid sequences, for example a mutant sequence, that varies from a reference 

5 sequence by one or more substitutions, deletions, or additions, the net effect of 
which does not result in an adverse functional dissimilarity between reference and 
subject sequences. For purposes of the present invention, sequences having 
equivalent biological activity, and equivalent expression characteristics are 
considered substantially equivalent. For purposes of determining equivalence, 

10 truncation of the mature sequence should be disregarded. 

The invention further provides methods of obtaining homologs from other 
strains of Streptococcus pneumoniae, of the fragments of the Streptococcus 
pneumoniae genome of the present invention and homologs of the proteins encoded 
by the ORFs of the present invention. As used herein, a sequence or protein of 

15 Streptococcus pneumoniae is defined as a homolog of a fragment of the 
Streptococcus pneumoniae fragments or contigs or a protein encoded by one of the 
ORFs of the present invention, if it shares significant homology to one of the 
fragments of the Streptococcus pneumoniae genome of the present invention or a 
protein encoded by one of the ORFs of the present invention. Specifically, by 

20 using the sequence disclosed herein as a probe or as primers, and techniques such 
as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain 
homologs. 

As used herein, two nucleic acid molecules or proteins are said to "share 
significant homology" if the two contain regions which possess greater than 85% 

25 sequence (amino acid or nucleic acid) homology. Preferred homologs in this 
regard are those with more than 90% homology. Especially preferred are those 
with 93% or more homology. Among especially preferred homologs those with 
95% or more homology are particularly preferred. Very particularly preferred 
among these are those with 97% and even more particularly preferred among those 

30 are homologs with 99% or more homology. The most preferred homologs among 
these are those with 99.9% homology or more. It will be understood that, among 
measures of homology, identity is particularly preferred in this regard. 

Region specific primers or probes derived from the nucleotide sequence 
provided in SEQ ID NOS: 1-391 or from a nucleotide sequence at least 95%, 

35 particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ 
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ID NOS: 1-391 can be used to prime DNA synthesis and PCR amplification, as 
well as to identify colonies containing cloned DNA encoding a homoiog. Methods 
suitable to this aspect of the present invention are well known and have been 
described in great detail in many publications such as, for example, Innis et aL, 
5 PCR Protocols, Academic Press, San Diego, CA (1990)). 

When using primers derived from SEQ ID NOS: 1-391 or from a nucleotide 
sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-391, 
one skilled in the art will recognize that by employing high stringency conditions 
(e.g., annealing at 50-60°C in 6X SSPC and 50% formamide, and washing at 50- 

10 65°C in 0.5X SSPC) only sequences which are greater than 75% homologous to 
the primer will be amplified. By employing lower stringency conditions (e.g., 
hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C 
in 0.5X SSPC), sequences which are greater than 40-50% homologous to the 
primer will also be amplified. 

15 When using DNA probes derived from SEQ ID NOS: 1-391, or from a 

nucleotide sequence having an aforementioned identity to a sequence of SEQ ID 
* NOS: 1-391, for colony/plaque hybridization, one skilled in the art will recognize 
that by employing high stringency conditions (e.g., hybridizing at 50- 65°C in 5X 
SSPC and 50% formamide, and washing at 50- 65°C in 0.5X SSPC), sequences 

20 having regions which are greater than 90% homologous to the probe can be 
obtained, and that by employing lower stringency conditions (e.g., hybridizing at 
35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C in 0.5X 
SSPC), sequences having regions which are greater than 35-45% homologous to 
the probe will be obtained. 

25 Any organism can be used as the source for homologs of the present 

invention so long as the organism naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs are 
bacteria which are closely related to Streptococcus pneumoniae. 

30 ILLUSTRATIVE USES OF COMPOSITIONS OF THE 

INVENTION 

Each ORF provided in Tables 1 and 2 is identified with a function by 
homology to a known gene or polypeptide. As a result, one skilled in the art can 
use the polypeptides of the present invention for commercial, therapeutic and 
35 industrial purposes consistent with the type of putative identification of the 
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polypeptide. Such identifications permit one skilled in the art to use the 
Streptococcus pneumoniae ORFs in a manner similar to the known type of 
sequences for which the identification is made; for example, to ferment a particular 
sugar source or to produce a particular metabolite. A variety of reviews illustrative 
5 of this aspect of the invention are available, including the following reviews on the 
industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND 
BIOTECHNOLOGY HANDBOOK, 2nd Ed., MacMillan Publications, Ltd. NY 
(1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et aL, Eds., 
Elsevier Science Publishers, Amsterdam, The Netherlands (1985). A variety of 
10 exemplary uses that illustrate this and similar aspects of the present invention are 
discussed below. 

1. Biosynthetic Enzymes 

Open reading frames encoding proteins involved in mediating the catalytic 

15 reactions involved in intermediary and macromolecular metabolism, the 
biosynthesis of small molecules, cellular processes and other functions includes 
enzymes involved in the degradation of the intermediary products of metabolism, 
enzymes involved in central intermediary metabolism, enzymes involved in 
respiration, both aerobic and anaerobic, enzymes involved in fermentation, 

20 enzymes involved in ATP proton motor force conversion, enzymes involved in 
broad regulatory function, enzymes involved in amino acid synthesis, enzymes 
involved in nucleotide synthesis, enzymes involved in cofactor and vitamin 
synthesis, can be used for industrial biosynthesis. 

The various metabolic pathways present in Streptococcus pneumoniae can 

25 be identified based on absolute nutritional requirements as well as by examining the 
various enzymes identified in Table 1-3 and SEQ ID NOS: 1-391. 

Of particular interest are polypeptides involved in the degradation of 
intermediary metabolites as well as non-macromolecular metabolism. Such 
enzymes include amylases, glucose oxidases, and catalase. 

30 Proteolytic enzymes are another class of commercially important enzymes. 

Proteolytic enzymes find use in a number of industrial processes including the 
processing of flax and other vegetable fibers, in the extraction, clarification and 
depectinization of fruit juices, in the extraction of vegetables' oil and in the 
maceration of fruits and vegetables to give unicellular fruits. A detailed review of 

35 the proteolytic enzymes, used in the food industry is provided in Rombouts et al, 
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Symbiosis 21:19 (1986) and Voragen et al in Biocatalysts In Agricultural 
Biotechnology, Whitaker et al, Eds., American Chemical Society Symposium 
Series 389:93 (1989) . 

The metabolism of sugars is an important aspect of the primary metabolism 
5 of Streptococcus pneumoniae: Enzymes involved in the degradation of sugars, 
such as, particularly, glucose, galactose, fructose and xylose, can be used in 
industrial fermentation. Some of the important sugar transforming enzymes, from 
a commercial viewpoint, include sugar isomerases such as glucose isomerase. 
Other metabolic enzymes have found commercial use such as glucose oxidases 

10 which produces ketogulonic acid (KG A). KG A is an intermediate in the 
commercial production of ascorbic acid using the Reichstein's procedure, as 
described in Krueger et a/., Biotechnology 6(A) . Rhine et al, Eds., Verlag Press, 
Weinheim, Germany ( 1 984). 

Glucose oxidase (GOD) is commercially available and has been used in 

15 purified form as well as in an immobilized form for the deoxygenation of beer. 
See, for instance, Hartmeir et at, Biotechnology Letters 1:21 (1979). The most 
important application of GOD is the industrial scale fermentation of gluconic acid. 
Market for gluconic acids which are used in the detergent, textile, leather, 
photographic, pharmaceutical, food, feed and concrete industry, as described, for 

20 example, in Bigelis et al, beginning on page 357 in GENE MANIPULATIONS 
AND FUNGI; Benett et al f Eds., Academic Press, New York (1985). In addition 
to industrial applications, GOD has found applications in medicine for quantitative 
determination of glucose in body fluids recently in biotechnology for analyzing 
syrups from starch and cellulose hydrosylates. This application is described in 

25 Owusu et al, Biochem. et Biophysica. Acta. 572:83 (1986), for instance. 

The main sweetener used in the world today is sugar which comes from 
sugar beets and sugar cane. In the field of industrial enzymes, the glucose 
isomerase process shows the largest expansion in the market today. Initially, 
soluble enzymes were used and later immobilized enzymes were developed 

30 (Krueger et ai, Biotechnology, The Textbook of Industrial Microbiology, Sinauer 
Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of 
glucose- produced high fructose syrups is by far the largest industrial business 
using immobilized enzymes. A review of the industrial use of these enzymes is 
provided by Jorgensen, Starch 40:301 (1988). 
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Proteinases, such as alkaline serine proteinases, are used as detergent 
additives and thus represent one of the largest volumes of microbial enzymes used 
in the industrial sector. Because of their industrial importance, there is a large body 
of published and unpublished information regarding the use of these enzymes in 
5 industrial processes. (See Faultman et aL, Acid Proteases Structure Function and 
Biology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey et ai, 
Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et al. y 
Report Industrial Enzymes by 1990, Hel Hepner & Associates, London (1986)). 

Another class of commercially usable proteins of the present invention are 

10 the microbial lipases, described by, for instance, Macrae et al % Philosophical 
Transactions of the Chiral Society of London 310:227 (1985) and Poserke, Journal 
of the American Oil Chemist Society 61: 1 758 (1984). A major use of lipases is in 
the fat and oil industry for the production of neutral glycerides using lipase 
catalyzed inter-esterification of readily available triglycerides. Application of 

15 lipases include the use as a detergent additive to facilitate the removal of fats from 
fabrics in the course of the washing procedures. 

The use of enzymes, and in particular microbial enzymes, as catalyst for 
key steps in the synthesis of complex organic molecules is gaining popularity at a 
great rate. One area of great interest is the preparation of chiral intermediates. 

20 Preparation of chiral intermediates is of interest to a wide range of synthetic 
chemists particularly those scientists involved with the preparation of new 
pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et ai, Recent 
Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, 
Boca Raton, Florida (1990)). The following reactions catalyzed by enzymes are of 

25 interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, 
amides and nitriles, esterification reactions, trans-esterification reactions, synthesis 
of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to 
carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming 
reactions such as the aldol reaction. 

30 When considering the use of an enzyme encoded by one of the ORFs of the 

present invention for biotransformation and organic synthesis it is sometimes 
necessary to consider the respective advantages and disadvantages of using a 
microorganism as opposed to an isolated enzyme. Pros and cons of using a whole 
cell system on the one hand or an isolated partially purified enzyme on the other 
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hand, has been described in detail by Bud et a/., Chemistry in Britain (1987), p. 
127. 

Amino transferases, enzymes involved in the biosynthesis and metabolism 
of amino acids, are useful in the catalytic production of amino acids. The 
5 advantages of using microbial based enzyme systems is that the amino transferase 
enzymes catalyze the stereo- selective synthesis of only L-amino acids and 
generally possess uniformly high catalytic rates. A description of the use of amino 
transferases for amino acid production is provided by Roselle-David, Methods of 
Enzymology 756:479 (1987). 
10 Another category of useful proteins encoded by the ORFs of the present 

invention include enzymes involved in nucleic acid synthesis, repair, and 
recombination. 

2. Generation of Antibodies 

15 As described here, the proteins of the present invention, as well as 

homologs thereof, can be used in a variety of procedures and methods known in 
the art which are currently applied to other proteins. The proteins of the present 
invention can further be used to generate an antibody which selectively binds the 
protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well 

20 fragments of these antibodies, and humanized forms. 

The invention further provides antibodies which selectively bind to one of 
the proteins of the present invention and hybridomas which produce these 
antibodies. A hybridoma is an immortalized cell line which is capable of secreting 
a specific monoclonal antibody. 

25 In general, techniques for preparing polyclonal and monoclonal antibodies 

as well as hybridomas capable of producing the desired antibody are well known in 
the art (Campbell, A. M M Monoclonal Antibody Technology: Laboratory 
Techniques In Biochemistry And Molecular Biology, Elsevier Science Publishers, 
Amsterdam, The Netherlands (1984); St. Groth et al, 7. Immunol Methods 35: 1- 

30 21 (1980), Kohler and Milstein, Nature 256:495-497 (1975)), the trioma 
technique, the human B-cell hybridoma technique (Kozbor et al, Immunology 
Today 4:12 (1983), pgs. 77-96 of Cole et ai, in Monoclonal Antibodies And 
Cancer Therapy, Alan R. Liss, Inc. (1985)). Any animal (mouse, rabbit, 

etc.) which is known to produce antibodies can be immunized with the pseudogene 

35 polypeptide. Methods for immunization are well known in the art. Such methods 
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include subcutaneous or interperitoneal injection of the polypeptide. One skilled in 
the art will recognize that the amount of the protein encoded by the ORF of the 
present invention used for immunization will vary based on the animal which is 
immunized, the antigenicity of the peptide and the site of injection. 
5 The protein which is used as an immunogen may be modified or 

administered in an adjuvant in order to increase the protein's antigenicity. Methods 
of increasing the antigenicity of a protein are well known in the art and include, but 
are not limited to coupling the antigen with a heterologous protein (such as globulin 
or galactosidase) or through the inclusion of an adjuvant during immunization. 

10 For monoclonal antibodies, spleen cells from the immunized animals are 

removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and 
allowed to become monoclonal antibody producing hybridoma cells. 

Any one of a number of methods well known in the art can be used to 
identify the hybridoma cell which produces an antibody with the desired 

15 characteristics. These include screening the hybridomas with an ELISA assay, 
western blot analysis, or radioimmunoassay (Lutz et al, Exp. Cell Res. 775:109- 
124(1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and 
subclass is determined using procedures known in the art (Campbell, A. M., 
20 Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1984)). 

Techniques described for the production of single chain antibodies (U. S . 
. Patent 4,946,77.8) can be adapted to produce single chain antibodies to proteins of 
25 the present invention. 

For polyclonal antibodies, antibody containing antisera is isolated from the 
immunized animal and is screened for the presence of antibodies with the desired 
specificity using one of the above-described procedures. 

The present invention further provides the above- described antibodies in 
30 detectably labelled form. Antibodies can be detectably labelled through the use of 
radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such 
as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as 
FTTC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing 
such labeling are well-known in the art, for example see Sternberger et a/., J. 
35 Histochem. Cytochem. 75:315 (1970); Bayer, E. A. et al. f Meth. Enzym. 62:308 
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(1979); Engval, E. etal, Immunol. 109:129 (1972); Goding, J. W., /. Immunol 
Meth. 75:215(1976)). 

The labeled antibodies of the present invention can be used for in vitro, in 
vivo, and in situ assays to identify cells or tissues in which a fragment of the 
5 Streptococcus pneumoniae genome is expressed. 

The present invention further provides the above-described antibodies 
immobilized on a solid support. Examples of such solid supports include plastics 
such as polycarbonate, complex carbohydrates such as agarose and sepharose, 
aciylic resins and such as polyacrylamide and latex beads. Techniques for 

10 coupling antibodies to such solid supports are well known in the art (Weir, D. M. 
et al, "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific 
Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al, Meth. 
Enzym. 34 Academic Press, N. Y. (1974)). The immobilized antibodies of the 
present invention can be used for in vitro, in vivo, and in situ assays as well as for 

1 5 immunoaffinity purification of the proteins of the present invention. 

3. Diagnostic Assays and Kits 

The present invention further provides methods to identify the expression 
of one of the ORFs of the present invention, or homolog thereof, in a test sample, 

20 using one of the DFs or antibodies of the present invention. 

In detail, such methods comprise incubating a test sample with one or more 
of the antibodies or one or more of the DFs of the present invention and assaying 
for binding of the DFs or antibodies to components within the test sample. 

Conditions for incubating a DF or antibody with a test sample vary. 

25 Incubation conditions depend on the format employed in the assay, the detection 
methods employed, and the type and nature of the DF or antibody used in the 
assay. One skilled in the art will recognize that any one of the commonly available 
hybridization, amplification or immunological assay formats can readily be adapted 
to employ the DFs or antibodies of the present invention. Examples of such assays 

30 can be found in Chard, T., An Introduction to Radioimmunoassay and Related 
Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); 
Bullock, G. R. et ai, Techniques in Immunocytochemistry, Academic Press, 
Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, P., Practice and 
Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and 
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Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1985). 

The test samples of the present invention include cells, protein or membrane 
extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or 

5 urine. The test sample used in the above-described method will vary based on the 
assay format, nature of the detection method and the tissues, cells or extracts used 
as the sample to be assayed. Methods for preparing protein extracts or membrane 
extracts of cells are well known in the art and can be readily be adapted in order to 
obtain a sample which is compatible with the system utilized. 

10 In another embodiment of the present invention, kits are provided which 

contain the necessary reagents to carry out the assays of the present invention. 

Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises: (a) a first container 
comprising one of the DFs or antibodies of the present invention; and (b) one or 

15 more other containers comprising one or more of the following: wash reagents, 
reagents capable of detecting presence of a bound DF or antibody. 

In detail, a compartmentalized kit includes any kit in which reagents are 
contained in separate containers. Such containers include small glass containers, 
plastic containers or strips of plastic or paper. Such containers allows one to 

20 efficiently transfer reagents from one compartment to another compartment such 
that the samples and reagents are not cross-contaminated, and the agents or 
solutions of each container can be added in a quantitative fashion from one 
compartment to another. Such containers will include a container which will accept 
the test sample, a container which contains the antibodies used in the assay, 

25 containers which contain wash reagents (such as phosphate buffered saline, Tris- 
buffers, etc.), and containers which contain the reagents used to detect the bound 
antibody or DF. 

Types of detection reagents include labelled nucleic acid probes, labelled 
secondary antibodies, or in the alternative, if the primary antibody is labelled, the 
30 enzymatic, or antibody binding reagents which are capable of reacting with the 
labelled antibody. One skilled in the art will readily recognize that the disclosed 
DFs and antibodies of the present invention can be readily incorporated into one of 
the established kit formats which are well known in the art. 

35 4. Screening Assay for Binding Agents 
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Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents which bind to a 
protein encoded by one of the ORFs of the present invention or to one of the 
fragments and the Streptococcus pneumoniae fragment and contigs herein 
5 described. 

In general, such methods comprise steps of: 

(a) contacting an agent with an isolated protein encoded by one of the 
ORFs of the present invention, or an isolated fragment of the Streptococcus 
pneumoniae genome; and 

10 (b) determining whether the agent binds to said protein or said fragment. 

The agents screened in the above assay can be, but are not limited to, 
peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The 
agents can be selected and screened at random or rationally selected or designed 
using protein modeling techniques. 

15 For random screening, agents such as peptides, carbohydrates, 

pharmaceutical agents and the like are selected at random and are assayed for their 
ability to bind to the protein encoded by the ORF of the present invention. 

Alternatively, agents may be rationally selected or designed. As used 
herein, an agent is said to be "rationally selected or designed" when the agent is 

20 chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate 
peptides, pharmaceutical agents and the like capable of binding to a specific peptide 
sequence in order to generate rationally designed antipeptide peptides, for example 
see Hurby et a/., "Application of Synthetic Peptides: Antisense Peptides," in 

25 Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, 
and Kaspczak et a/., Biochemistry 25:9230-8 (1989), or pharmaceutical agents, or 
the like. 

In addition to the foregoing, one class of agents of the present invention, as 
broadly described, can be used to control gene expression through binding to one 
30 of the ORFs or EMFs of the present invention. As described above, such agents 
can be randomly screened or rationally designed/selected. Targeting the ORF or 
EMF allows a skilled artisan to design sequence specific or element specific agents, 
modulating the expression of either a single ORF or multiple ORFs which rely on 
the same EMF for expression control. 
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One class of DNA binding agents are agents which contain base residues 
which hybridize or form a triple helix by binding to DNA or RNA. Such agents 
can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a 
variety of sulfhydryl or polymeric derivatives which have base attachment capacity. 

5 Agents suitable for use in these methods usually contain 20 to 40 bases and 

are designed to be complementary to a region of the gene involved in transcription 
(triple helix - see Lee et ai, Nucl Acids Res. 6:3073 (1979); Cooney et aU 
Science 241:456 (1988); and Dervan et a/., Science 257:1360 (1991)) or to the 
mRNA itself (antisense - Okano, J. Neurochem. 56/560 (1991); 

10 Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
Boca Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of 
RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the 

1 5 sequences of the present invention can be used to design antisense and triple helix- 
forming oligonucleotides, and other DNA binding agents. 

5. Pharmaceutical Compositions and Vaccines 

The present invention further provides pharmaceutical agents which can be 

20 used to modulate the growth or pathogenicity of Streptococcus pneumoniae, or 
another related organism, in vivo or in vitro. As used herein, a "pharmaceutical 
agent" is defined as a composition of matter which can be formulated using known 
techniques to provide a pharmaceutical compositions. As used herein, the 
"pharmaceutical agents of the present invention" refers the pharmaceutical agents 

25 which are derived from the proteins encoded by the ORFs of the present invention 
or are agents which are identified using the herein described assays. 

As used herein, a pharmaceutical agent is said to "modulate the growth 
pathogenicity of Streptococcus pneumoniae or a related organism, in vivo or in 
vitro" when the agent reduces the rate of growth, rate of division, or viability of 

30 the organism in question. The pharmaceutical agents of the present invention can 
modulate the growth or pathogenicity of an organism in many fashions, although 
an understanding of the underlying mechanism of action is not needed to practice 
the use of the pharmaceutical agents of the present invention. Some agents will 
modulate the growth by binding to an important protein thus blocking the biological 

35 activity of the protein, while other agents may bind to a component of the outer 
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surface of the organism blocking attachment or rendering the organism more prone 
to act the bodies nature immune system. Alternatively, the agent may comprise a 
protein encoded by one of the ORFs of the present invention and serve as a 
vaccine. The development and use of a vaccine based on outer membrane 
5 components are well known in the art. 

As used herein, a "related organism" is a broad term which refers to any 
organism whose growth can be modulated by one of the pharmaceutical agents of 
the present invention. In general, such an organism will contain a homolog of the 
protein which is the target of the pharmaceutical agent or the protein used as a 
10 vaccine. As such, related organisms do not need to be bacterial but may be fungal 
or viral pathogens. 

The pharmaceutical agents and compositions of the present invention may 
be administered in a convenient manner, such as by the oral, topical, intravenous, 
intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The 

15 pharmaceutical compositions are administered in an amount which is effective for 
treating and/or prophylaxis of the specific indication. In general, they are 
administered in an amount of at least about 1 mg/kg body weight and in most cases 
they will be administered in an amount not in excess of about 1 g/kg body weight 
per day. In most cases, the dosage is from about 0. 1 mg/kg to about 10 g/kg body 

20 weight daily, taking into account the routes of administration, symptoms, etc. 

The agents of the present invention can be used in native form or can be 
modified to form a chemical derivative. As used herein, a molecule is said to be a 
"chemical derivative" of another molecule when it contains additional chemical 
moieties not normally a part of the molecule. Such moieties may improve the 

25 molecule's solubility, absorption, biological half life, etc. The moieties may 
alternatively decrease the toxicity of the molecule, eliminate or attenuate any 
undesirable side effect of the molecule, etc. Moieties capable of mediating such 
effects are disclosed in, among other sources, REMINGTON'S 
PHARMACEUTICAL SCIENCES (1980) cited elsewhere herein. 

30 For example, such moieties may change an immunological character of the 

functional derivative, such as affinity for a given antibody. Such changes in 
immunomodulation activity are measured by the appropriate assay, such as a 
competitive type immunoassay. Modifications of such protein properties as redox 
or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic 

35 degradation or the tendency to aggregate with carriers or into multimers also may 
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be effected in this way and can be assayed by methods well known to the skilled 
artisan. 

The therapeutic effects of the agents of the present invention may be 
obtained by providing the agent to a patient by any suitable means (e.g., inhalation, 
5 intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is 
preferred to administer the agent of the present invention so as to achieve an 
effective concentration within the blood or tissue in which the growth of the 
organism is to be controlled. To achieve an effective blood concentration, the 
preferred method is to administer the agent by injection. The administration may be 
10 by continuous infusion, or by single or multiple injections. 

In providing a patient with one of the agents of the present invention, the 
dosage of the administered agent will vary depending upon such factors as the 
patient's age, weight, height, sex, general medical condition, previous medical 
history, etc. In general, it is desirable to provide the recipient with a dosage of 
15 agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of 
patient), although a lower or higher dosage may be administered. The 
therapeutically effective dose can be lowered by using combinations of the agents 
of the present invention or another agent. 

As used herein, two or more compounds or agents are said to be 
20 administered "in combination" with each other when either (1) the physiological 
effects of each compound, or (2) the serum concentrations of each compound can 
be measured at the same time. The composition of the present invention can be 
administered concurrently with, prior to, or following the administration of the 
other agent. 

25 The agents of the present invention are intended to be provided to recipient 

subjects in an amount sufficient to decrease the rate of growth (as defined above) of 
the target organism. 

The administration of the agent(s) of the invention may be for either a 
"prophylactic" or "therapeutic" purpose. When provided prophylactically, the 

30 agent(s) are provided in advance of any symptoms indicative of the organisms 
growth. The prophylactic administration of the agent(s) serves to prevent, 
attenuate, or decrease the rate of onset of any subsequent infection. When 
provided therapeutically, the agent(s) are provided at (or shortly after) the onset of 
an indication of infection. The therapeutic administration of the compound(s) 



WO 98/18931 



40 



PCT/US97/19588 



serves to attenuate the pathological symptoms of the infection and to increase the 
rate of recovery. 

The agents of the present invention are administered to a subject, such as a 
mammal, or a patient, in a pharmaceutical^ acceptable form and in a therapeutically 
5 effective concentration. A composition is said to be "pharmacologically acceptable" 
if its administration can be tolerated by a recipient patient. Such an agent is said to 
be administered in a "therapeutically effective amount" if the amount administered 
is physiologically significant. An agent is physiologically significant if its presence 
results in a detectable change in the physiology of a recipient patient. 

10 The agents of the present invention can be formulated according to known 

methods to prepare pharmaceutically useful compositions, whereby these materials, 
or their functional derivatives, are combined in a mixture with a pharmaceutically 
acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of 
other human proteins, e,g. y human serum albumin, are described, for example, in 

15 REMINGTON'S PHARMACEUTICAL SCIENCES, 16* Ed., Osol, A., Ed., 
Mack Publishing, Easton PA (1980). In order to form a pharmaceutically 
acceptable composition suitable for effective administration, such compositions will 
contain an effective amount of one or more of the agents of the present invention, 
together with a suitable amount of carrier vehicle. 

20 Additional pharmaceutical methods may be employed to control the duration 

of action. Control release preparations may be achieved through the use of 
polymers to complex or absorb one or more of the agents of the present invention. 
The controlled delivery may be effectuated by a variety of well known techniques, 
including formulation with macromolecules such as, for example, polyesters, 

25 polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, 
carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the 
macromolecules and the agent in the formulation, and by appropriate use of 
methods of incorporation, which can be manipulated to effectuate a desired time 
course of release. Another possible method to control the duration of action by 

30 controlled release preparations is to incorporate agents of the present invention into 
particles of a polymeric material such as polyesters, polyamino acids, hydrogels, 
poly(lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of 
incorporating these agents into polymeric particles, it is possible to entrap these 
materials in microcapsules prepared, for example, by coacervation techniques or by 

35 interfacial polymerization with, for example, hydroxymethylcellulose or gelatine- 
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microcapsules and poly(methylmethacylate) microcapsules, respectively, or in 
colloidal drug delivery systems, for example, liposomes, albumin microspheres, 
microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such 
techniques are disclosed in REMINGTON'S PHARMACEUTICAL SCIENCES 
5 (1980). 

The invention further provides a pharmaceutical pack or kit comprising one 
or more containers filled with one or more of the ingredients of the pharmaceutical 
compositions of the invention. Associated with such containers) can be a notice in 
the form prescribed by a governmental agency regulating the manufacture, use or 
10 sale of pharmaceuticals or biological products, which notice reflects approval by 
the agency of manufacture, use or sale for human administration. 

In addition, the agents of the present invention may be employed in 
conjunction with other therapeutic compounds. 

15 6. Shot-Gun Approach to Megabase DNA Sequencing 

The present invention further demonstrates that a large sequence can be 
sequenced using a random shotgun approach. This procedure, described in detail 
in the examples that follow, has eliminated the up front cost of isolating and 
ordering overlapping or contiguous subclones prior to the start of the sequencing 
20 protocols. 

Certain aspects of the present invention are described in greater detail in the 
examples that follow. The examples are provided by way of illustration. Other 
aspects and embodiments of the present invention are contemplated by the 
inventors, as will be clear to those of skill in the art from reading the present 
25 disclosure. 

ILLUSTRATIVE EXAMPI FS 

LIBRARIES AND SEQUENCING 
30 1. Shotgun Sequencing Probability Analysis 

The overall strategy for a shotgun approach to whole genome sequencing 

follows from the Lander and Waterman (Landerman and Waterman, Genomics 

2:231 (1988)) application of the equation for the Poisson distribution. According 

to this treatment, the probability, P , that any given base in a sequence of size L, in 

35 nucleotides, is not sequenced after a certain amount, n, in nucleotides, of random 

0 
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sequence has been determined can be calculated by the equation P = e~ m , where m 
is L/n, the fold coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 
Mb of sequence has been randomly generated (IX coverage). Apthat point, P = 
e" 1 = 0.37. The probability that any given base has not been sequenced is the same 
5 as the probability that any region of the whole sequence L has not been determiSed 
and, therefore, is equivalent to the fraction of the whole sequence that has yet to be 
determined. Thus, at one-fold coverage, approximately 37% of a polynucleotide of 
size L, in nucleotides has not been sequenced. When 14 Mb of sequence has been 
generated, coverage is 5X for a 2.8 Mb and the unsequenced fraction drops to 
10 .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by sequencing 
approximately 17,000 random clones from both insert ends with an average 
sequence read length of 410 bp. 

Similarly, the total gap length, G, is determined by the equation G = Le' m , 
and the average gap size, g, follows the equation, g = L/n. Thus, 5X coverage 
15 leaves about 240 gaps averaging about 82 bp in size in a sequence of a 
polynucleotide 2.8 Mb long. 

The treatment above is essentially that of Lander and Waterman, Genomics 
2:231 (1988). 

20 2. Random Library Construction 

In order to approximate the random model described above during actual 
sequencing, a nearly ideal library of cloned genomic fragments is required. The 
following library construction procedure was developed to achieve this end. 

Streptococcus pneumoniae DNA is prepared by phenol extraction. A 

25 mixture containing 200 |j.g DNA in 1 .0 ml of 300 mM sodium acetate, 10 mM Tris- 
HC1, 1 mM Na-EDTA, 50% glycerol is processed through a nebulizer (IPI Medical 
Products) with a stream of nitrogen adjusted to 35 Kpa for 2 minutes. The 
sonicated DNA is ethanol precipitated and redissolved in 500 TE buffer. 

To create blunt-ends, a 100 jxl aliquot of the resuspended DNA is digested 

30 with 5 units of BAL3 1 nuclease (New England BioLabs) for 10 min at 30°C in 200 
Hi BAL31 buffer. The digested DNA is phenol-extracted, ethanol-piecipitated, 
redissolved in 100 Hi TE buffer, and then size-fractionated by electrophoresis 
through a 1.0% low melting temperature agarose gel. The section containing DNA 
fragments 1 .6-2.0 kb in size is excised from the gel, and the LGT agarose is melted 

35 and the resulting solution is extracted with phenol to separate the agarose from the 
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DNA. DNA is ethanol precipitated and redissolved in 20 |il of TE buffer for 
ligation to vector. 

A two-step ligation procedure is used to produce a plasmid library with 
97% inserts, of which >99% were single inserts. The first ligation mixture (50 ul) 
5 contains 2 |Xg of DNA fragments, 2 |4.g pUC18 DNA (Pharmacia) cut with Smal 
and dephosphorylated with bacterial alkaline phosphatase, and 10 units of T4 ligase 
(GIBCO/BRL) and is incubated at 14°C for 4 hr. The ligation mixture then is 
phenol extracted and ethanol precipitated, and the precipitated DNA is dissolved in 
20 fxl TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete 

10 bands in a ladder are visualized by ethidium bromide-staining and UV illumination 
and identified by size as insert (I), vector (v), v+I, v+2i, v+3i, etc. The portion of 
the gel containing v+I DNA is excised and the v+I DNA is recovered and 
resuspended into 20 |xl TE. The v+I DNA then is blunt-ended by T4 polymerase 
treatment for 5 min. at 37°C in a reaction mixture (50 ul) containing the v+I linears, 

15 500 ^iM each of the 4 dNTPs, and 9 units of T4 polymerase (New England 
BioLabs), under recommended buffer conditions. After phenol extraction and 
ethanol precipitation the repaired v+I linears are dissolved in 20 |ui TE. The final 
ligation to produce circles is carried out in a 50 ^1 reaction containing 5 ^1 of v+I 
linears and 5 units of T4 ligase at 14°C overnight. After 10 min. at 70°C the 

20 following day, the reaction mixture is stored at -20°C. 

This two-stage procedure results in a molecularly random collection of 
single-insert plasmid recombinants with minimal contamination from double-insert 
chimeras (<1%) or free vector (<3%). 

Since deviation from randomness can arise from propagation the'DNA in 

25 the host, E. coli host cells deficient in all recombination and restriction functions 
(A. Greener, Strategies 3 (1):5 (1990)) are used to prevent rearrangements, 
deletions, and loss of clones by restriction. Furthermore, transformed cells are 
plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase 
which allows multiplication and selection of the most rapidly growing cells. 

30 Plating is carried out as follows. A 100 \il aliquot of Epicurian Coli SURE 

II Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a 
chilled Falcon 2059 tube on ice. A 1.7 ^1 aliquot of 1.42 M beta-meicaptoethanol 
is added to the aliquot of cells to a final concentration of 25 mM. Cells are 
incubated on ice for 10 min. A 1 ^1 aliquot of the final ligation is added to the cells 

35 and incubated on ice for 30 min. The cells are heat pulsed for 30 sec. at 42°C and 
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placed back on ice for 2 min. The outgrowth period in liquid culture is eliminated 
from this protocol in order to minimize the preferential growth of any given 
transformed cell. Instead the transformation mixture is plated directly on a nutrient 
rich SOB plate containing a 5 ml bottom layer of SOB agar (5% SOB agar: 20 g 
5 tryptone, 5 g yeast extract, 0.5 g NaCl, 1 .5% Difco Agar per liter of media). The 5 
ml bottom layer is supplemented with 0.4 ml of 50 mg/ml ampicillin per 100 ml 
SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 ml X-Gal 
(2%), 1 ml MgCl (1 M), and 1 ml MgSO /100 ml SOB agar. The 15 ml top layer 
is poured just prior to plating. Our titer is approximately 100 colonies/10 jLtl aliquot 
10 of transformation? ^ • ; 

All colonies are picked for template preparation regardless of size. Thus, 
only clones lost due to "poison" DNA or deleterious gene products are deleted from 
the library, resulting in a slight increase in gap number over that expected. 

15 3. Random DNA Sequencing 

High quality double stranded DNA plasmid templates are prepared using a 
"boiling bead" method developed in collaboration with Advanced Genetic 
Technology Corp. (Gaithersburg, MD) (Adams et aL, Science 252:1651 (1991); 
Adams et aL, Nature 555:632 (1992)). Plasmid preparation is performed in a 96- 

20 well format for all stages of DNA preparation from bacterial growth through final 
DNA purification. Template concentration is determined using Hoechst Dye and a 
Millipore Cytofluor. DNA concentrations are not adjusted, but low-yielding 
templates are identified where possible and not sequenced. 

Templates are also prepared from two Streptococcus pneumoniae lambda 

25 genomic libraries. An amplified library is constructed in the vector Lambda GEM- 
12 (Promega) and an unamplified library is constructed in Lambda DASH II 
(Stratagene). In particular, for the unamplified lambda library, Streptococcus 
pneumoniae DNA (> 100 kb) is partially digested in a reaction mixture (200 ul) 
containing 50 y,g DNA, IX Sau3AI buffer, 20 units Sau3AI for 6 min. at 23°C. 

30 The digested DNA was phenol-extracted and electrophoresed on a 0.5% low 
melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kb are excised 
and recovered in a final volume of 6 ul. One |il of fragments is used with 1 (il of 
DASHII vector (Stratagene) in the recommended ligation reaction. One \il of the 
ligation mixture is used per packaging reaction following the recommended 

35 protocol with the Gigapack II XL Packaging Extract (Stratagene, #2277 1 1 ). Phage 
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are plated directly without amplification from the packaging mixture (after dilution 
with 500 nl of recommended SM buffer and chloroform treatment). Yield is about 
2.5x1 0 3 pfu/ul. The amplified library is prepared essentially as above except the 
lambda GEM- 12 vector is used. After packaging, about 3.5X10 4 pfu are plated on 
5 the restrictive NM539 host. The lysate is harvested in 2 ml of SM buffer and 
stored frozen in 7% dimethylsulfoxide. The phage titer is approximately IxlO 9 
pfu/ml. 

Liquid lysates (100 fil) are prepared from randomly selected plaques (from 
the unamplified library) and template is prepared by long-range PCR using T7 and 

10 T3 vector-specific primers. 

Sequencing reactions are carried out on plasmid and/or PCR templates 
using the AB Catalyst LabStation with Applied Biosystems PRISM Ready 
Reaction Dye Primer Cycle Sequencing Kits for the M13 forward (Ml 3-21) and 
the M 13 reverse (M13RP1) primers (Adams et a/., Nature 565:474 (1994)). Dye 

15 terminator sequencing reactions are carried out on the lambda templates on a 
Perkin-Elmer 9600 Thermocycler using the Applied Biosystems Ready Reaction 
Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to sequence 
the ends of the inserts from the Lambda GEM- 12 library and T7 and T3 primers are 
used to sequence the ends of the inserts from the Lambda DASH II library. 

20 Sequencing reactions are performed by eight individuals using an average of 
fourteen AB 373 DNA Sequencers per day. All sequencing reactions are analyzed 
using the Stretch modification of the AB 373, primarily using a 34 cm well-to-read 
distance. The overall sequencing success rate very approximately is about 85% for 
M13-21 and M13RP1 sequences and 65% for dye-terminator reactions. The 

25 average usable read length is 485 bp for Ml 3-21 sequences, 445bp for M13RP1 
sequences, and 375 bp for dye-terminator reactions. 

Richards et aL, Chapter 28 in AUTOMATED DNA SEQUENCING AND 
ANALYSIS, M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, 
London, (1994) described the value of using sequence from both ends of 

30 sequencing templates to facilitate ordering of contigs in shotgun assembly projects 
of lambda and cosmid clones. We balance the desirability of both-end sequencing 
(including the reduced cost of lower total number of templates) against shorter 
read-lengths for sequencing reactions performed with the M13RP1 (reverse) primer 
compared to the Ml 3-21 (forward) primer. Approximately one-half of the 

35 templates are sequenced from both ends. Random reverse sequencing reactions are 
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done based on successful forward sequencing reactions. Some M13RP1 
sequences are obtained in a semi-directed fashion: Ml 3-2 1: sequences pointing 
outward at the ends of contigs are chosen for M13RP1 sequencing in an effort to 
specifically order contigs. 

5 

4. Protocol for Automated Cycle Sequencing 

The sequencing is carried out using ABI Catalyst robots and AB 373 
Automated DNA Sequencers. The Catalyst robot is a publicly available 
sophisticated pipetting and temperature control robot which has been developed 

10 specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted 
templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the 
thermostable Taq DNA polymerase, fluorescently-labelled sequencing primers, and 
reaction buffer. Reaction mixes and templates are combined in the wells of an 
aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear 

15 amplification (i.e.., one primer synthesis) steps are performed including 
denaturation, annealing of primer and template, and extension; i.e., DNA 
synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents 
evaporation without the need for an oil overlay. 

Two sequencing protocols are used: one for dye-labelled primers and a 

20 second for dye-labelled dideoxy chain terminators. The shotgun sequencing 
involves use of four dye-labelled sequencing primers, one for each of the four 
terminator nucleotide. Each dye-primer is labelled with a different fluorescent dye, 
permitting the four individual reactions to be combined into one lane of the 373 
DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently 

25 supplies pre-mixed reaction mixes in bulk packages containing all the necessary 
non-template reagents for sequencing. Sequencing can be done with both plasmid 
and PCR- generated templates with both dye-primers and dye- terminators with 
approximately equal fidelity, although plasmid templates generally give longer 
usable sequences. 

30 Thirty-two reactions are loaded per AB373 Sequencer each day, for a total 

of 960 samples. Electrophoresis is run overnight following the manufacturer's 
protocols, and the data is collected for twelve hours. Following electrophoresis 
and fluorescence detection, the ABI 373 performs automatic lane tracking and base- 
calling. The lane-tracking is confirmed visually. Each sequence electropherogram 

35 (or fluorescence lane trace) is inspected visually and assessed for quality. Trailing 



WO 98/18931 



47 



PCT/US97/19588 



sequences of low quality are removed and the sequence itself is loaded via software 
to a Sybase database (archived daily to 8mm tape). Leading vector poiylinker 
sequence is removed automatically by a software program. Average edited lengths 
of sequences from the standard ABI 373 are around 400 bp and depend mostly on 
5 the quality of the template used for the sequencing reaction. ABI 373 Sequencers 
converted to Stretch Liners provide a longer electrophoresis path prior to 
fluorescence detection and increase the average number of usable bases to 500-600 
bp. 

10 INFORMATICS 

1. Data Management 

A number of information management systems for a large-scale sequencing 
lab have been developed. (For review see, for instance, Kerlavage et al. t 
Proceedings of the Twenty-Sixth Annual Hawaii International Conference on 

15 System Sciences, IEEE Computer Society Press, Washington D. C, 585 (1993)) 
The system used to collect and assemble the sequence data was developed using the 
Sybase relational database management system and was designed to automate data 
flow wherever possible and to reduce user error. The database stores and 
correlates all information collected during the entire operation from template 

20 preparation to final analysis of the genome. Because the raw output of the ABI 373 
Sequencers was based on a Macintosh platform and the data management system 
chosen was based on a Unix platform, it was necessary to design and implement a 
variety of multi- user, client-server applications which allow the raw data as well as 
analysis results to flow seamlessly into the database with a minimum of user effort. 

25 

2. Assembly 

An assembly engine (TIGR Assembler) developed for the rapid and 
accurate assembly of thousands of sequence fragments was employed to generate 
contigs. The TIGR assembler simultaneously clusters and assembles fragments of 

30 the genome. In order to obtain the speed necessary to assemble more than 10 4 
fragments, the algorithm builds a hash table of 12 bp oligonucleotide subsequences 
to generate a list of potential sequence fragment overlaps. The number of potential 
overlaps for each fragment determines which fragments are likely to fall into 
repetitive elements. Beginning with a single seed sequence fragment, TIGR 

35 Assembler extends the. current contig by attempting to add the best matching 
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fragment based on oligonucleotide content. The contig and candidate fragment are 
aligned using a modified version of the Smith- Waterman algorithm which provides 
for optimal gapped alignments (Waterman, M. S., Methods in Enzymology 
164:765 (1988)). The contig is extended by the fragment only if strict criteria for 
5 the quality of the match are met. The match criteria include the minimum length of 
overlap, the maximum length of an unmatched end, and the minimum percentage 
match. These criteria are automatically lowered by the algorithm in regions of 
minimal coverage and raised in regions with a possible repetitive element. The 
number of potential overlaps for each fragment determines which fragments are 

10 likely to fall into repetitive elements. Fragments representing the boundaries of 
repetitive elements and potentially chimeric fragments are often rejected based on 
partial mismatches at the ends of alignments and excluded from the current contig. 
TIGR Assembler is designed to take advantage of clone size information coupled 
with sequencing from both ends of each template. It enforces the constraint that 

15 sequence fragments from two ends of the same template point toward one another 
in the contig and are located within a certain range of base pairs (definable for each 
clone based on the known clone size range for a given library). 

The process resulted in 391 contigs as represented by SEQ ID NOs: 1-391. 

20 3. Identifying Genes 

The predicted coding regions of the Streptococcus pneumoniae genome 
were initially defined with the program GeneMark, which finds ORFs using a 
probabilistic classification technique. The predicted coding region sequences were 
used in searches against a database of all nucleotide sequences from GenBank 

25 (October, 1997), using the BLASTN search method to identify overlaps of 50 or 
more nucleotides with at least a 95% identity. Those ORFs with nucleotide 
sequence matches are shown in Table 1. The ORFs without such matches were 
translated to protein sequences and compared to a non-redundant database of 
known proteins generated by combining the Swiss-prot, PIR and GenPept 

30 databases. ORFs that matched a database protein with BLASTP probability less 
than or equal to 0.01 are shown in Table 2. The table also lists assigned functions 
based on the closest match in the databases. ORFs that did not match protein or 
nucleotide sequences in the databases at these levels are shown in Table 3. 
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ILLUSTRATIVE APPLICATIONS 

1. Production of an Antibody to a Streptococcus pneumoniae 
Protein 

Substantially pure protein or polypeptide is isolated from the transfected or 
5 transformed cells using any one of the methods known in the art. The protein can 
also be produced in a recombinant prokaryotic expression system, such as E, coli, 
or can be chemically synthesized. Concentration of protein in the final preparation 
is adjusted, for example, by concentration on an Amicon filter device, to the level 
of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can 
10 then be prepared as follows. 

2. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and 
isolated as described can be prepared from murine hybridomas according to the 

15 classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or 
modifications of the methods thereof. Briefly, a mouse is repetitively inoculated 
with a few micrograms of the selected protein over a period of a few weeks. The 
mouse is then sacrificed, and the antibody producing cells of the spleen isolated. 
The spleen cells are fused by means of polyethylene glycol with mouse myeloma 

20 cells, and the excess unfused cells destroyed by growth of the system on selective 
media comprising aminopterin (HAT media). The successfully fused cells are 
diluted and aliquots of the dilution placed in wells of a microtiter plate where 
growth of the culture is continued. Antibody-producing clones are identified by 
detection of antibody, in the. supernatant fluid of the wells by immunoassay 

25 procedures, such as ELISA, as originally described by Engvall, E., Meth. 
EnzymoL 70:419 (1980), and modified methods thereof. Selected positive clones 
can be expanded and their monoclonal antibody product harvested for use. Detailed 
procedures for monoclonal antibody production are described in Davis, L. et aL, 
Basic Methods in Molecular Biology, Elsevier, New York. Section 21-2 (1989). 

30 
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3. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogenous epitopes of a 
single protein can be prepared by immunizing suitable animals with the expressed 
protein described above, which can be unmodified or modified to enhance 
5 immunogenicity. Effective polyclonal antibody production is affected by many 
factors related both to the antigen and the host species. For example, small 
molecules tend to be less immunogenic than others and may require the use of 
carriers and adjuvant. Also, host animals vary in response to site of inoculations 
and dose, with both inadequate or excessive doses of antigen resulting in low titer 
10 antisera. Small doses (ng level) of antigen administered at multiple intradermal 
sites appears to be most reliable. An effective immunization protocol for rabbits 
can be found in Vaitukaitis, J. et a/., / Clin. Endocrinol. Metab. 33:988-991 
(1971). 

Booster injections can be given at regular intervals, and antiserum harvested 

15 when antibody titer thereof, as determined semi-quantitatively, for example, by 
double immunodiffusion in agar against known concentrations of the antigen, 
begins to fall. See, for example, Ouchterlony, O. et al % Chap. 19 in: Handbook of 
Experimental Immunology, Wier, D., ed, Blackwell (1973). Plateau concentration 
of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12M). 

20 Affinity of the antisera for the antigen is determined by preparing competitive 
binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of 
Clinical Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For 
Microbiology, Washington, D. C. (1980) 

Antibody preparations prepared according to either protocol are useful in 

25 quantitative immunoassays which determine concentrations of antigen-bearing 
substances in biological samples; they are also used semi- quantitatively or 
qualitatively to identify the presence of antigen in a biological sample. In addition, 
antibodies are useful in various animal models of pneumococcal disease as a means 
of evaluating the protein used to make the antibody as a potential vaccine target or 

30 as a means of evaluating the antibody as a potential immunotherapeutic or 
immunoprophylactic reagent. 
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4. Preparation of PCR Primers and Amplification of DNA 

Various fragments of the Streptococcus pneumoniae genome; such as those 
of Tables 1-3 and SEQ ID NOS: 1-391 can be used, in accordance with the present 
invention, to prepare PCR primers for a variety of uses. The PCR primers are 
5 preferably at least 15 bases, and more preferably at least 18 bases in length. When 
selecting a primer sequence, it is preferred that the primer pairs have approximately 
the same G/C ratio, so that melting temperatures are approximately the same. The 
PCR primers and amplified DNA of this Example find use in the Examples that 
follow. 

10 

5. Gene expression from DNA Sequences Corresponding to 

ORFs 

A fragment of the Streptococcus pneumoniae genome provided in Tables i - 
3 is introduced into an expression vector using conventional technology. 

15 Techniques to transfer cloned sequences into expression vectors that direct protein 
translation in mammalian, yeast, insect or bacterial expression systems are well 
known in the art. Commercially available vectors and expression systems are 
available from a variety of suppliers including Stratagene (La Jolla, California), 
Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If 

20 desired, to enhance expression and facilitate proper protein folding, the codon 
context and codon pairing of the sequence may be optimized for the particular 
expression organism, as explained by Hatfield et ai, U. S. Patent No. 5,082,767, 
incorporated herein by this reference. 
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The following is provided as one exemplary method to generate 
polypeptide(s) from cloned ORFs of the Streptococcus pneumoniae genome 
fragment. Bacterial ORFs generally lack a poly A addition signal. The addition 
signal sequence can be added to the construct by, for example, splicing out the poly 
5 A addition sequence from pSG5 (Stratagene) using Bgll and Sail restriction 
endonuclease enzymes and incorporating it into the mammalian expression vector 
pXTl (Stratagene) for use in eukaiyotic expression systems. pXTl contains the 
LTRs and a portion of the gag gene of Moloney Murine Leukemia Virus. The 
positions of the LTRs in the construct allow efficient stable transfection. The 

10 vector includes the Herpes Simplex thymidine kinase promoter and the selectable 
neomycin gene. The Streptococcus pneumoniae DN A is obtained by PCR from the 
bacterial vector using oligonucleotide primers complementary to the Streptococcus 
pneumoniae DNA and containing restriction endonuclease sequences for PstI 
incorporated into the 5' primer and Bglll at the 5' end of the corresponding 

15 Streptococcus pneumoniae DNA 3* primer, taking care to ensure that the 
Streptococcus pneumoniae DNA is positioned such that its followed with the poly 
A addition sequence. The purified fragment obtained from the resulting PCR 
reaction is digested with PstI, blunt ended with an exonuclease, digested with 
Bglll, purified and ligated to pXTl, now containing a poly A addition sequence 

20 and digested Bglll. 

The ligated product is transfected into mouse NIH 3T3 cells using 
Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions 
outlined in the product specification. Positive transfectants are selected after 
growing the transfected cells in 600 ug/ml G4I8 (Sigma, St. Louis, Missouri). 

25 The protein is preferably released into the supernatant. However if the protein has 
membrane binding domains, the protein may additionally be retained within the cell 
or expression may be restricted to the cell surface. Since it may be necessary to 
purify and locate the transfected product, synthetic 15-mer peptides synthesized 
from the predicted Streptococcus pneumoniae DNA sequence are injected into mice 

30 to generate antibody to the polypeptide encoded by the Streptococcus pneumoniae 
DNA. 
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Alternatively and if antibody production is not possible, the Streptococcus 
pneumoniae DNA sequence is additionally incorporated into eukaryotic expression 
vectors and expressed as, for example, a globin fusion. Antibody to the globin 
moiety then is used to purify the chimeric protein. Corresponding protease 
5 cleavage sites are engineered between . the globin moiety and the polypeptide 
encoded by the Streptococcus pneumoniae DNA so that the latter may be freed 
from the formed by simple protease digestion. One useful expression vector for 
generating globin chimerics is pSG5 (Stratagene). This vector encodes a rabbit 
globin. Intron II of the rabbit globin gene facilitates splicing of the expressed 

10 transcript, and the polyadenylation signal incorporated into the construct increases 
the level of expression. These techniques are well known to those skilled in the art 
of molecular biology. Standard methods are published in methods texts such as 
Davis et a/., cited elsewhere herein, and many of the methods are available from the 
technical assistance representatives from Stratagene, Life Technologies, Inc., or 

15 Promega. Polypeptides of the invention also may be produced using in vitro 
translation systems such as in vitro ExpressTM Translation Kit (Stratagene). 

While the present invention has been described in some detail for purposes 
of clarity and understanding, one skilled in the art will appreciate that various 
changes in form and detail can be made without departing from the true scope of 

20 the invention. 

All patents, patent applications and publications referred to above are 
hereby incorporated by reference. 
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(1) GENERAL INFORMATION: 

(i) APPLICANT: Charles Kunsch 

Gil H. Choi 
Patrick S. Dillon 
Craig A. Rosen 
Steven C. Barash 
Michael R. Fannon 
Brian A. Dougherty . 

(ii) TITLE OF INVENTION: Streptococcus pneumoniae Polynucleotides and Sequences 

(iii) NUMBER OF SEQUENCES: 391 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Human Genome Sciences, Inc. 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 

(D) STATE: Maryland 

(E) COUNTRY: USA 

(F) ZIP: 20850 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER: HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 



{vi) CURRENT APPLICATION DATA: 
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(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Brookes. A. Anders 

(B) REGISTRATION NUMBER: 36,373 

(C) REFERENCE/ DOCKET NUMBER: PB340P1 

<vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (301) 309-8504 

(B) TELEFAX: (301) 309-8512 



WO 98/18931 



PCT/US97/19S88 



150 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5625 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

. CCAAGCAAAA CCAGCTACAG CTAAAGGAAC TTACGTAACA AACTTGACTA TCACAACTAC 60 

TCAAGGTGTT GGTATCAAAG TTGACGTAAA CTCACTTTAA TCAGTAGTTA AAGTAATGTA 120 

AAAAAGTTGA AGACGCTATG TCTCAACTTT TTTTGATGTA CGACGGGCAT GTTGTATAGT 180 

AGATGTGTAC TATTCTAGTT TCAATCTACT ATAGTAGCTC AGAAGTCGGT ACTTAAACGT 240 

GCTATATCAA AACCAGTCCT TGAAAAACGT GGACTGGTTT CGTGTTTGGA TTATTACCTT 300 

GAACGACATG CGTTAAAAGT TAGTTGAACC GCCGTATGCC GAACGGACGT ACGGTGGTGT 360 

GAGAGGGGCT AGAGATTATC CCCTACTCGA TTTCGAAATC TAGTGGAATG AATCTGGAAT 420 

AGTCCATCGA GCTTTCTAAT ACTCTTCGAA AATCTCTTCA AACCACGTCA ACGTCGCCTT 480 

GCCGTGCGTA TGGTTACTGA CTTCGTCAGT TCTATCCACA ACCTCAAAAC AGTGTTTTGA 540 

GCTGACTACG TCAGTTCCAT CTACAACCTC AAAACAGTGT TTTGAGCAAC CTGCGGCTAG 600 

TTTCCTAGTT TGCTCTTTGG TTTTCATTGA GTATAACACA TTGTTAGAAG TTGGTTTAAA 660 

TTTCCTAATC AGTTTGTTCA CATTTACCTT CGATATATTA TATCCCATAG TTAAGGTTGG 720 

TCATACAGAT GATTATAGTC ATGGAGCCGT AAAACTTAGT GTTTCTTTAG TTGACAAAGA 780 

TGCCATGAAA AAAATATTTG TAACTGTAAT AGGATATTTT GAAATAAATA TAGATGAAAA 840 

TATCACCGAT ATTCTATACG TAAATGGTAC TGCTATTCTT TATCTTTATT TACGTTCAAT 900 

TGTTTCAATA GTTTCGGCAA TTGATAGCAG TGAAGCAATG TTGCTACCTA TCATTAATGT 960 

TTTAGAGTTA CTAGATAAAT CTCAACCTTT TGAAGAAGAA TAATTTATTA GCTCACTAAA 1020 

TTGAGGGTAA GGAAAAGTAA AAGCAGTAAG AAAAATGTCT TGCATTATAC AGCAACCTTT 1080 

TGGGAATGAG TGGATGGATT GAATAAAATT TGATTAAGAG TGGATGATTT ATCTGTAGAT 1140 

TATTATTGGA CAGTTAGTCT TGAAGTAGTC TAAGAATTAG GTTATAATCA GTAGAAGCCT 1200 

TGCTAATAAT GAGGAGGTTA GTTTATGTAT AGTAGACTGA ATCTAAAATA GTACGAAACA 1260 

ATTGCTAAAA CATTTATAGA AATTAATTTT ACTTTCCCAA TCGATTTGTT CTCATCTTAT 1320 

TTCAATCCGC TATATATTAT GGTATCGAAT CTTCATCAGA ATGATAAAAT TAATCAATTG 1380 

ATATCTGATT ACAAACAGAA TATGAAAGCT TTTTATATCA CTATTGAAAA ATTTATACGA 1440 
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GATGATGAAA GCCTTAAGTG TTATTTTATA AAGGTTATTT CAAGTCGTTC CAAGGTAACA 1500 

AGTCTAGATC AGATTGAAGC TGATAAAACG ATACAAAGAA AATATTCAAG TGAGCTAAAA 1560 

AAATTTATTG GATTTTATAA TGAGATTATT TGTGAGGAAA ATAGTTTCCT ACATGTACGA 1620 

AAGAGGTGGT CGAGTTGGTT TAGGTAGTCG ATGCGTGAGT TGATAATTCT CAGGGTATGG 1680 

ACTTCTTTTT CATGAATGAG GTAAAAGAGC AGGTATTGTT TAGAGACAAT CATTCTGAGC 1740 

ATATTTTCTG GATAGAGGGA GTATCCGATT TTATGATCAA AGTTAATACC GCCCTCTGGT 1800 

GAGAAGATGA GTAGGTTGGT AATTTAAACT ATTAAACAGA ATTTTTGATT AAAAGTATTA I860 

TTTCATGAGA GAAATCCTAA TTTCACAATC CATAGGCAAA CGCTTGCATT TCGTTTTTTA 1920 

TTGGACTATA ATAGGTTGGT ATAAAGCCTT CTGTAGTAAT AAAATGTAGA AGGTGTAGAA 1980 

AGTAAGGATT TAGAATATTT GTAGTTAAAA ACACAATGTT GCTATTCCTT ACGATAGGGA 2040 

GATAGATATG GCAATGATAG AAGTGGAACA TCTTCAGAAA AATTTTGTGA AGACTGTTAA 2100 

GGAACCGGGC TTGAAGGGGG CTTTGCGCTC CTTTATTCAT CCTGAAAAGC AGACCTTTGA 2160 

AGCGGTCAAG GATTTGACCT TTGAGGTTCC AAAAGGGCAG ATT TT AGO AT TTATCGGGGC 2220 

AAATGGTGCT GGGAAGTCGA CAACCATTAA AATGCTGACA GGAATTTTGA AACCAACATC 2280 

TGGTTTTTGT CGGATTAACG GCAAGATTCC CCAGGACAAT CGGCAAGATT ATGTCAAAGA 2340 

TATTGGCGTA GTCTTTGGAC AACGCACCCA GCTATGGTGG GATTTGGCTC TGCAAGAGAC 2400 

CTACACTGTC TTAAAAGAGA TTTATGATGT GCCAGACTCG CTCTTTCATA AGCGTATGGA 2460 

CTTTTTGAAT GAAGTCTTGG ATTTGAAGGA CTTTATCAAG GATCCCGTGC GGACTCTTTC 2520 

ACTGGGACAA CGGATGCGGG CGGATATTGC GGCCTCCTTG CTCCACAATC CCAAGGTTCT 2580 

TTTTTTAGAT GAGCCGACCA TTGGTTTGGA CGTTTCGGTT AAGGATAATA TTCGTCGGGC 2640 

AATTACTCAG ATCAATCAAG AGGAAGAAAC TACCATTCTT TTGACCACTC ACGATTTGAG 2700 

TGATATTGAG CAACTTTGTG ATCGGATTTT CATGATTGAC AAGGGGCAAG AGATTTTTGA 2760 

TGGAACGGTG AGCCAACTCA AGGAGACCTT TGGTAAGATG AAGACTCTCT CTTTTGAACT 2820 

GCTACCAGGT CAAAGTCATC TCGTCTCTCA CTATGACGGT CTGTCTGATA TGACCATTGA 2880 

TAGACAAGGA AACAGCCTCA ACATTGAATT TGATAGTTCT CGCTACCAGT CAGCTGACAT 2940 

TATCAAGCAA ACCCTGTCTG ATTTTGAAAT CCGCGATTTG AAGATGGTGG ATACGGATAT 3000 

TGAGGATATT ATCCGTCGCT TCTACCGAAA GGAGCTCTAG GATGATCAAA TTGTGGAGAC 3060 

GTTATAAACC CTTTATCAAT GCAGGGGTTC AGGAGTTGAT TACTTACCGA GTCAACTTTA 3120 

TTCTCTATCG GATTGGCGAT GTCATGGGGG CTTTTGTGGC CTTTTATCTC TGGAAGGCTG 3180 
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TCTTTGATTC TTCGCAAGAG TCTTTGATTC AGGGCTTCAG TATGGCGGAT ATCACCCTCT 3240 

ACATCATCAT GAGTTTTGTG ACCAATCTTC TGACTAGATC CGATTCGTCC TTTATGATTG 3300 

GGGAGGAGGT CAAGGATGGC TCCATTATCA TGCGTTTGTT GCGACCAGTG CATTTTGCGG 3360 

CCTCCTATCT TTTCACCGAG CTTGGTTCCA AGTGGTTGAT TTTTATCAGC GTTGGCCTTC 3420 

CATTTTTAAG TGTCATTGTC TTGATGAAAA TCATATCGGG TCAAGGTATT GTAGAGGTGC 3480 

TAGGATTAAC TGTCATTTAT CTTTTTAGCT TAACGCTCGC CTATCTGATT AACTTTTTCT 3540 

TTAATATTTG CTTTGGATTT TCAGCCTTTG TGTTTAAAAA TCTTTGGGGT TCCAACCTAC 3600 

TTAAGACTTC CATAGTGGCT TTTATGTCGG GGAGTTTGAT TCCCTTGGCA TTTTTTCCAA 3660 

AGGTTGTTTC AGATATTCTC TCCTTTTTGC CTTTTTCATC CTTGATTTAT ACTCCAGTTA 3720 

TGATCATTGT TGGAAAATAC GATGCCAGTC AGATTCTTCA GGCACTCCTT TTGCAGTTCT 3780 

TCTGGCTCTT AGTGATGGTG GGATTGTCTC AGTTAATTTG GAAACGGGTC CAGTCCTTTA 3840 

TCACCATTCA AGGAGGTTAG TATGAAAAAA TATCAACGAA TGCATCTGAT TTTTATCAGA 3900 

CAATACATCA AACAAATCAT GGAATATAAG GTAGATTTTG TGGTTGGTGT CTTGGGAGTC 3960 

TTTCTGACTC AAGGCTTGAA TCTCTTGTTT CTCAATGTCA TCTTTCAACA TATTCCATTC 4020 

CTAGAAGGCT GGACCTTTCA AGAGATAGCT TTCATTTATG GATTTTCCTT GATTCCCAAG 4080 

GGAATGGACC ATCTCTTTTT TGACAATCTC TGGGCACTAG GGCAACGCCT AGTCCGAAAA 4140 

GGGGAGTTTG ACAAGTATCT GACTCGTCCC ATCAATCCTC TCTTTCACAT CCTAGTTGAA 4200 

ACCTTTCAGA TTGATGCCTT GGGTGAACTC TTAGTCGGTG GTATTTTATT GGGAACAACA 4260 

GTGACCAGCA TTGTTTGGAC TCTTCCAAAA TTCCTGCTTT TCCTAGTTTG TATTCCTTTT 4320 

GCGACCTTGA TTTATACTTC TCTTAAAATC GCAACAGCCA GTATCGCCTT TTGGACTAAG 4380 

CAGTCAGGCG CCATGATTTA CATCTTCTAT ATGTTCAATG ACTTTGCTAA GTATCCGATT 4440 

TCTATTTACA ATTCTCTTCT TCGTTGGTTG ATTAGCTTTA TCGTGCCTTT CGCCTTTACA 4500 

GCCTACTATC CAGCTAGCTA TTTCTTACAG GAAAAGGATG TGTTCTTTAA CGTAGGAGGT 4560 

TTGATGTTGA TTTCTCTGGT TTTCTTTGTT ATTTCCCTTA AACTTTGGGA TAAGGGCTTA 4620 

GATTCCTACG AAAGTGCGGG TTCGTAAAAG CTAAAGTAAG ACTAAAATCA AGAAAGAAAC 4680 

TTATGATGTT TGTAATTGAA GAAGTCAAGG ATGAAAATCA AAAAAAGGCA GTTGTCGCTG 4740 

AGGTTTTGAA GGATTTGCCA GAATGGTTTG GAATCCCAGA AAGCACACAA GCCTATATAG 4800 

AAGGAACCAC GACACTGCAA GTTTGGACCG CCTATCAGGA GAGTGATTTG ACTAGATTTG 4860 

TAAGCTTATC CTATTCGAGT GAAGATTGTG CAGAGATTGA TTGTCTCGGC GTAAAAAAGC 4920 

TTATCAAGGT AGAAAAATTG GGAGCCAATT GCTTGCTACT TTAGAGAGTG AAGCTCGTAA 4980 
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AAAAGTTGGT TATCTGCAGG TCAAAACAGT GGCAGAAGGT TCTAATAAAG ATTATGATCG 5040 

AACAAATGAC TTTTATCGAG GTCTTGGCTT TAAAAAGTTA GAGATTTTTC CTCAACTATG 5100 

GAATCCGCAA AATCCTTGTC AGATTTTGAT TAAAAAGCTT GAATAATATT ACTTGACATC 5160 

TATTCTCAGA GTGCTATACT GTAAGTGTAA TCGCCGATTT AGCTTAGTTG GTAGAGCAAG 5220 

GCACTCGTAA AGCCTAGGTT ATAGGTAGAT AAACGACTGA GGATTTGAAA AAATAGATAG 5280 

GTAGAAGATA ACCGTTAAGC CTTACTCTTA GCGGTTATTT ATATTGTTTA ATAGCGCTAA 5340 

TATTTTATCA ATTATGCCTG TTTTCGTGTT TCTGGTAGTT GTTCAAGTTT ATTGCTACTA 5400 

TTTTTGATGG TATGAATGTG CTTATAATGT ATCCCGGTTA ACGAAAGTTT TGGACTTATA 5460 

CTCTTCGAAA ATCTCTTCAA ACCACGTCAA CGTCGCCTTG CCGTGCGTAT GGTTATGACT 5520 

TCGTCAGTTC TATCCACAAC CTCAAAACAG TGTTTTGAGT GACTACGTCA GTTCCATCTA 5580 

CAACCTCAAA ACACTGTTTT GCCCAATCTG CGGCTAGTTT CCTAG 5625 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7571 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



CTCTCCAGCT 


TTCCTTGCGA 


GTTGGCCATG 


TTGTGTCTTT 


AAGAAGTCTA 


AAAATATCTC 


60 


CAATAAAACG 


CATCGCTCTC 


TCCTATCTCG 


TTTCTCTGTG 


TGTAGTGTAC 


TTGCCACAAT 


120 


GCTTACAAAA 


TTTATTTACT 


TCTAGTCGTG 


TAGGCTTGAG 


GTTTCCGCTG 


ATCTTGATTG 


180 


AATAGTTTCT 


CGAACCACAA 


ACCGCACAAG 


CTAGGCTTGC 


TTTTTTTAGT 


GCCATAACGC 


240 


CTCCATCTTA 


TCCATTATAA 


CAAGAAAGCT 


AGGCTTTGAC 


AAGCATCTTA 


GCGAAATAGA 


300 


TTGACTATCG 


AATCCCATAT 


TGTTTGAGCC 


TTTTCCTTAA 


TCTTCGCATC 


TGAGATAGCC 


360 


CGGCTAGCCT 


CATCTACTAG 


ACTTTGCGCA 


CGCCCTCGAA 


TATCAGACAA 


ATTATCATCT 


420 


GTCTGGCTAT 


TATCATTGGT 


TTGTACTTGT 


CTTTTTGTAT 


TGGCTGGTGC 


AATTCCATTT 


480 


TGCTTATAAG 


CATTTTCAAC 


CGTAAAGGTA 


CTTCCTGGCG 


TATAAGGTAA 


AATGGTATTG 


540 


GCAATGTTTC 


TAAAGACATG 


AGCTGCACCG 


TTTGAAGTAG 


AGCCAGCTAG 


ATAGTGGTTT 


600 


TCATCAGTGG 


TCGGAAAGCC 


AAGCCAGTGG 


CTAATCACTA 


CATCCGGAGT 


ATAACCAATT 


660 


ACCCACTGGT 


CACTTGTGTA 


CTCCGGATTG 


AAAACTGCTT 


CAGTTGTTCC 


AGTTTTCCCT 


720 
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GCCATGACAT AGTCTGCAGG CGATGAACTA ATACCGGTAC CGTTGGTGAA AGTCCCCAAC 780 

ATCATACTGG TCATCTTGTC AGCTACAGAC TTATCAATCA CCCGTTTTTG TGAATTTTTA 840 

TGACTCGCAA TAACTTGTCC ACTAGCATTT TCAATTCTAC TAATAAAATG AGCTTCAGGC 900 

ATTAAACCTT CATTTGCAAA GGCGGCGTAT GCTTGAGCCA TTTGAAGAGG GTTGGTTTCA 960 

ACACCGCTTC CCAAGGCGAC ACCAAGAACA CGGTCGACCT TTTCCATGTT GAGTCCGAAT 1020 

TTTTCGCCTG CCTCAAAAGC CTTGTCGACA CCCAAATCAT TAACAGTGGC AACAGCAGGT 1080 

AGATTAAGCG ATTCTGCCAA GGCTTGATAC ATAGGAACTT CTCGACTCGT TTTGATCCCT 1140 

GCATAGTTAT CAACCTTATA GCTGTCATAC TGCATGGTAT GGTTATCCAA CTGCTTATTC 1200 

AAAGCCCAGC TTGCTTCAAC TGCTGGCGTA TAAACAACTA AAGGCTTAAT TGTAGAACCA 1260 

GGACTACGCT TTGATTGGGT TGCATAGTTG AAATTCCGGA ATCCAGTTTT ATCATTGTCA 1320 

GCAACTTGAC CGACAACTCC ACGAACTCCC CCTGTTTTCG GTTCGAGGGC TACACTTCCT 1380 

GATTGAGCAA ACGTTCCATC CTCTGCCCTC GGAAATAGCG ATGTGTTTTC ATAAACAATC 1440 

TGCATATTTG CTTGGTAGTT TTGGTCCAGC TCTGTGTAAA TGCGGTAGCC ATTATTGACA 1500 

ATCTCTTCCT CTGTTAGATT ATACTTGGAA ACAGCTTCAT TAACCACCGC ATCAAAATAA 1560 

GAGGGGTAAC GGTAATCTGA GATTTTTCCT TCATACTTAT CGTGCAATTG CGAAGTCATA 1620 

TCAACTTCAG CAGCTTTGGT TTCTTGGTTT TTATCAATAT ATCCTGCTGC AACCATATTC 1680 

TGCAAGACAG TATCGCGCCG ATTAGTAGAA TCTTCTACGG AATTCAAGGG ATTATACAGT 1740 

TCCGGCCCCT TGAGCATCCC TGCCAGAGTC GCAGCTTGAT CCAGACTCAC TTCTGATGCA 1800 

GAAACTCCAA AGTATTTCTT ACTCGCATCT TCTACACCCC ACACACCATT TCCAAAATAA 1860 

GCGTTGTTAA GGTACATGGT TAGAATTTGC TCCTTACTAT ATTTTTTGCT TAATTCTAAG 1920 

GCAAGGAAAA ATTCTTTCGC TTTTCTCTCA ACAGTTTGAT CCTGCGATAA ATAGGCGTTT 1980 

TTAGCCAGCT GTTGGGTAAT GGTAGAGCCA CCACCTGAAC GTCCAGCAGT GACAATAGCC 2040 

AAGAAAAAAC GGCCATAGTT AATCCCGTCA TTTTTATAGA AAGAACGGTC TTCTGTCGCA 2100 

ATAACAGCAT TCTGCAAGTT TTTACTGATG TCAGTCAGCT CAACATAGGT TCCCTTTTGA 2160 

CCAGACAAGG CACCAGCCTC TTTTTCTTCA CGGTCAAAAA TAAGAGTCCG AGTTTTCAAG 2220 

GCATTTTGCA AATCATTGAC ATTGGTCGAC TTGGCTACAG CAAACAAATA GATTCCAACT 2280 

AGCAAGCCTG CACTCAAACC TAGTATAAGG ATAATCTTTG TTAGATGATA ACGACGCCAG 2340 

AATTTTCGAA TCGGACCTAC TTGGGCTAAT TTTTTTCGAT CACTACGAGA GCGACGTAAG 2400 

ATAGTAGAAT CAGAGTCCTC TAGTTCACTT GTTTCTTTTT TAAAAAGAGA AAGAAATTTC 2460 
TCAAATAATT TATCTAATTT CATGCGTTTA TTTTATCATC TTCATCATAG GAAGACAAGA . 2520 
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ATTTAGCTAT TTCCTATCCA AATAGGGCTT TTTTTGTTAC AATATCTGTA TGCAATTCAC 2580 

ATTTACATTA CCCGCCTCTC TACCTCAAAT GACAGTAAAG CAATTACTTG AGGAACAACT 2640 

CCTCATCCCT AGAAAAATCC GTCATTTTTT GAGAATCAAG AAACATATTT TGATAAATCA 2700 

AGAAGAAGTC CACTGGAAGG AAATCGTAAA TCCTGGAGAT GTTTGCCAGT TGACTTTTGA 2760 

CGAGGAAGAT TATTCCCAAA AGACGATCCC TTGGGGCAAC CCAGACTTAG TGCAGGAAGT 2820 

TTATCAAGAT CAACACTTGA TTATTGTAAA CAAACCAGAG GGGATGAAAA CGCATGGTAA 2880 

TCAACCAAAC GAAATTGCCC TTCTTAACCA TGTCAGTACC TATGTTGGCC AAACCTGCTA 2940 

TGTCGTTCAT CGTCTGGACA TGGAAACCAG TGGCTTAGTT CTCTTTGCCA AAAATCCTTT 3000 

TATCCTGCCC ATTCTCAATC GCTTATTGGA GAAAAAAGAG ATTTCTAGAG AATATTGGGC 3060 

TCTAGTTGAT GGAAATATCA ACAGAAAAGA ACTTGTTTTC AGAGACAAAA TTGGACGTGA 3120 

TCGCCATGAT CGTAGAAAAA GAATAGTTGA TGCAAAAAAT GGGCAATATG CTGAAACGCA 3180 

TGTAAGCAGA TTAAAGCAAT TCTCAAACAA GACTTCCTTG GCTCATTGCA AGCTAAAGAC 3240 

AGGGCGAACC CATCAGATTC GTGTGCACCT TTCGCATCAT AATCTTCCTA TCCTGGGAGA 3300 

CCCTCTCTAT AATAGTAAAT CAAAGACAAG CCGGCTTATG CTTCATGCCT TCCGACTTTC 3360 

CTTTACCCAC CCACTTACTT TAGAGAAGCT AACTTTCACT ACCCTTTCAA ATACATTTGA 3420 

AAAAGAATTA AAAAAGAATG GATGATCGTG TCATCCATTT TTCCATATAA AAAAGCAAGA 3480 

CCACAAAGCC TTGCTTTCTA TCAACTCAAG AATTATTTAG CAATTTTTGC GAAGTATTCA 3540 

AGAGTACGAA CAAGTTGTGC AGTGTATGAC ATTTCGTTGT CGTACCATGA TACAACTTTA 3600 

ACCAATTGTT TACCGTCAAC GTCAAGAACT TTAGTTTGAG TTGCGTCAAA CAATGAACCG 3660 

TAAGACATAC CTACGATATC TGAAGATACG ATTGGATCTT CTGTGTAACC GTATGATTCG 3720 

TTTGAAGCTG CTTTCATAGC TGCGTTCACT TCATCAACAG TAACGTTCTT TTCAAGAACT 3780 

GCTACCAATT CAGTAACTGA TCCAGTTGGA GTTGGAACGC GTTGTGCAGA TCCGTCAAGT 3840 

TTACCATTCA ATTCTGGGAT TACAAGACCG ATAGCTTTTG CAGCACCAGT TGAGTTAGGA 3900 

ACGATGTTTG CAGCACCAGC GCGAGCACGG CGAAGGTCAC CACCACGGTG TGGTCCGTCA 3960 

AGGATCATTT GGTCACCAGT GTAAGCGTGG ATAGTAGTCA TCAATCCTTC AACAACACCA 4020 

AAGTTGTCTT GAAGAGCTTT AGCCATTGGA GCCAAGCAGT TTGTAGTACA TGAAGCACCT 4080 

GAGATAACTG TTTCAGTACC GTCAAGAACG TCGTGGTTAG TGTTGAATAC AACTGTTTTA 4140 

ACGTCGTTTC CACCAGGAGC AGTGATAACA ACTTTTTTAG CTCCACCTTT AAGGTGTTTT 4200 

TCAGCTGCTT CTTTCTTAGC AAAGAAACCA GTAGCTTCAA GAACGATTTC TACACCGTCA 4260 
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GTAGCCCAGT CGATTTGTTC TGGATCACGT TCAGCAGAAA CTTTGATGAA TTTACCGTTA 4320 

ACTTCAAATC CACCTTCTTT AACTTCAACA GTACCGTCGA AACGACCTTG AGTTGTGTCG 4380 

TATTTCAACA AGTGTGCAAG CATAACTGGA TCTGTAAGGT CGTTGATGCG TGTAACTTCA 4440 

ACACCTTCTA CGTTTTGGAT ACGACGGAAA GCAAGACGAC CGATACGTCC GAAACCGTTA 4500 

ATACCAACTT TAACTACCAT TAGTGATTTC CTCCTTATGA AAATCATGAA ATTTTTATTG 4560 

TGAAAAGAGT AACTTGAATC ACTACAAATC ACCTTTCAAC AAACCTATTA TACAACTATT 4620 

TGAGTTGAATv TGCAAGTATG GCCATTGTTT TTCTATGTTA GTTTCTTTTT AAGACTGTAA 4680 

ACCAAGGAAT CCCTTACTAT TCATAGCATA ACGATTCTAT AGGATCCATT TTACTAATCT 4740 

TACGCGCCGG GAAGTAGGCT GAGACATAAC CAAGTAATAG AGCGAAAACT AGAGTTCCTA 4800 

AAACAGATAA AAGATTTAAT TTAAAAACCT TAGTGATGGA TGGGTAAAAG TGACTTACAA 4860 

TCGCATTCGC CAAACTTCCC ACCCCTTGTG CAACCAAAAA TGCCAGCAGC AAGGCGATGC 4920 

CTACAATCCA GATAGCCTCG TAAATAAAAA TTCCTTTGAC ATCACGATTC TGATAACCAA 4980 

CTGCTTTCAT GACACCTATT TCCTTGGAAC GTTGCATGAT ATTGATGTAA ATAATGATAC 5040 

CAATCATAAC CGCTGCTACC ACAATAGCTT GTGATGAAAG CACAATCAAT AATCCCTGAA 5100 

TAACACGAAT AAAGGTAATC ACAATATCAA GAACTCTCTG TTGAGAAAGC ACAGTATACT 5160 

TCTTATTTTT CTGTAATTCT TCTGTTACTA CTTTTGTCTG TGATGGATCT TTGAGTTCCA 5220 

AGATAAAATA AGATACAGCT TTCGTAAATC CAGCCTCTTT CAAAATCGTT TCCATTTGAT 5280 

GAGACAGCAT GAAACTGTTG CTGTCCTCCA TGTCATCTTC ATCATTGATT ACACGTACAA 5340 

TCTTCGTTTG AAATTGAGCA ATCTTACTAG TTTCGGCAGC ACTTTCTACA ATGCTGGCTG 5400 

AGACTGATTT GCCAATAAGA TCATTAGCTG TCAAATTTTT TCCTGTCTGT TCATTCCAAT 5460 

TTTTTAGTAA ACTGCTTGGA ATCGTTAATC CCTGTTCATT TGTATCAGTA TAGAGGGATC 5520 

CAGCCAACAC TTTGTCCGTC TCATTATTAC TAACAGAGAT ACTTGTATCA TCATAAAGAC 5580 

TCACTACTTG AGCATAAGAA GGCATCGTTT GACTCAGATC CATTTCTTGC CCATCTATAG 5640 

TAATATTTGA CATGTTCATC CCAAAAGGAC TCTCCAAATA - TTTAATAGCT TCTTTCCCAA 5700 

CTGTATCCGT GATATATAGT CAATTGAAAC AAGAGCAGGA TAAAAAAGCC TCGTAAAAGG 5760 

TATTGCAACT TGGTAATACC TTTTTGAGGT GCTTTTTGAT ATGAGCCCAT GTTTTCTCAA S820 

TAGGATTGTA CTCAGGCGAG TAGGGAGGAA GAGGTAAAAG TTTATGCCCA AACTCTTCGC 5880 

ATAAAAGTTC TAGCTTCCCC ATTCTATGGA ATCTTACATT ATCCATAATA ATAACCGATG 5940 

GTGTGTTTAA TGTTGGTAAG AGAAAATTCT GAAACCAAGC TTCAAAAAAG TCGCTCGTCA 6000 

TCGTCTCTTC GTAAGTCATT GGAGCGATTA ATTCACCATT TGTTAGACCT GCAACCAAAG 6060 
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AAATCCTCTG ATATCTTCTT CCAGATACTT TGCCTCTTAT TAATTGACCT TTTAATGAGC 6120 

G AC CAT ATT C TCGATAAAAA TAAGTATCGA ATCCTGTTTC GTCAATCTAA ACAGGTGCTA 6180 

GGTGCTTTAA ACTATTAAAA TTCTTAAGAA ATAAGGCTAC TTTTTCTGGG TCTTGTTCAT 6240 

AGTAGGTGTG GTTCTTTTTT CGAGTGTAGC CCATAGCTTT GAGCGTATAG TGGATGGTAG 6300 

TTGGATGACA GCCAAATTCA GAAGCTATTT CAGTCAAATA AGCGTCTGGA TTGTCAGTAA 6360 

GATAGTTTTT AAGTCTATCT CTATCAACCT TTCTTGGTTT TATTCCTTTT ACTTGGTGGT 6420 

TTAGCTCTCC TGTTTTCTCT TTTAGCTTTA ACCAGCCATA AATGGTATTA CGTGAGATTT 6480 

GGAAAACGTG TGATGCTTCT GTTATACTAC CTGTTCGCTC ACAATAAGAG AGAACTTTTT 6540 

TACGAAAATC TATTGAATAT GCCATAAAAA GATTATACCA CATTGTGTAC TATTTTTGGT 6600 

TCATTTTACT ATATTTGAAG AGGCGTTTAA ACTATCTGAC ATAAAACTCG TTCTAGAGGA 6660 

AAGACATCCT TTAAAAAGTT AGTTTATTTT ACAACTTAGA CATCAAGGTA GGTTAACCCC 6720 

TTCATGGAAA AATCAAGACT CTTAGCACTA TGGGTTAAAC TACCACTGGA GACGTAATCA 6780 

ATCGCTAAAC CACGAAAACG GCTAATAGTG GTCATATCAA TATTTCCAGA ACATTCAATC 6840 

CGAGAACGTC CTGCAATTAG GGTAATGGCC TGTTCAATCT GTTCCAATGA CATATTATCC 6900 

AACATGATAA TATCAGCACC CGCCGCCGCA GCTTCTTCGG CAGCAGCAAG GCTTTCCACT 6960 

TCCACCTCGA CCATTTTCAC AAAAGGGGCA TAGGCACGCG CTTGAGCAAT TGCCTTTTGA 7020 

ACACTACCTA CTGCCGCAAT GTGATTGTCT TTTAGCAGGA TAGCATCTGA TAAATTAAAG 7080 

CGATGATTAT AGCCACCGCC AACTCTCACG GCATATTTCT CAAAAAGACG TAAATTAGGA 7140 

GTAGTTTTTC GAGTATCAAA TACCTTAATG CAATCATCGC CTAAGGCTTC TACATAAGCA 7200 

GCTGTCATCG AAGCAATCCC TGATAAATGT TGTAAAAAAT TCAAGGCAAC GCGTTCACAT 7260 

GTTAAGAGAC TTCTCACCGA GCCTATGATT TCTAAAACCA AATCGCCACT AGTCAAACGA 7320 

TCCCCATCCT TAAATTGATG AGGATTCTGG AAGGTCACCT CGGCATCAAA TAGGGTAAAA 7380 

ACCCTTTGAA AAACGGTTAG CCCCGCTAAA ACACCAGCTT CCTTGGCAAA AAGCGACACC 7440 

TTGGCTTGGC CATGATGATC AAAAATGGCA TTGGTACTGT AATCTTCGGA ATGAACATCT 7500 

TCTCGCAAGG CTGCTTTCAA TGTATCATCT ATTTGAAAAG GGGTTAAATC AGTTGAAATG 7560 

ATTGACATCA C 7571 
(2) INFORMATION FOR SEC; ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26365 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
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(D> TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 3: 

TTTGCTAGTG GCTTAAATTC TTCAGGAAAA TCAGGCGTAT CTAAAAGTCG TGTCGTTTTT 60 

GTTTCATCTA TATAAAGACT TCCTGCTCCC CCTACAACTA GAAAACGTGT CTGTGTTCCA 120 

GCAAGAAGCT GATTAAATAG TTCGATTGAT TTGCTGTGGA GCGGTAGCGT ATCTGGTGTA 180 

TAAGCACCAA ACGCTGAAAT AACAGCATCA AATCCAGTAA GATCATCTTT TGTCAACTCA 240 

AATAAATCTT TTTTAATAAT AGACTCAGCT TGACTTTTGT TTTCAGAACG AACAATAGCC 300 

GTTACTTCAT GTCCTCGTTT GACTGCTTCT TCAACAATTG CTTTCCCCGC TTGTCCATTT 360 

GCTGCAATAA CTGCTAGTTT CATTTTTTAT ACCTCTCTTG TTGTAATTAT TTTAGTTACA 420 

GAAATTGTGA CACTCTTAAT AATCAATGTC AATAGTCTTG CTTAATTATT ATCAAAATAT 480 

TTCTACCAAG AAAACTAACC ATGATTCTAG TGAAAAAAAA TCTTCTTTGT CAACAAATTT 540 

ACTTTCTTGT TTTAAACATG CTATAATAAT CATAGCAAGA GATCTAAGTT GTCTGTTTTT 600 

TTAAAACGAG GTGATTATCA TGCGTAGATT CTATTCCCAT CTCCCCTACT ATCTGGTCAT 660 

ATTATTCTTT TATTGGCCAC TTTATGAGTT GTTCTTACTA GTTGTTTCTG ACCCCCTTAC 720 

ACTCAAGGGA CTCTATATAA ACAATCTTCT CTTCTTTACA CCTCTGGTAA TCTTGATTGT 780 

ATCGTTACTC TATAGCTACC GTTTCCGTTT CTCACTTTGA TGGTTAGTTG GTAACGGACT 840 

GCTCTTTTAC TTTACTATCA TAACCTTTGG TGAGTTTATA CTAATTTACT TGCTAATCTA 900 

TGAAACAGTT GCTCTGGTCG GCATGGATTC TGGTATTAGC ATCAAGCATA TTCTACAAAA 960 

AATGAAAAAC AAAAAACTTT CACAAAATCC TTGAAAAATC TCACAATCAT GCTATAATAA 1020 

TCCATAGAGA CAAGTCACTT AGTCCCTTTC TACTAGAGAG TGCGTGGTTG CTGGAAACGC 1080 

ATAGGAAGTC TAAACTGATA CTACTCTTGA GTTTTTTATG AAAACATAAA ACGGTGGCCA 1140 

CGTTAGAGCC GATCAGAGGT GTCCCTCTCT TTTGAGGTAC ATAAATGAAG GTGGAACCAC 1200 

GTTGCGACGT CCTTTCGAGG ATGTCGCATT TTTTTATTAG GATACTAATT ATGGAGTTGC 1260 

AAGAATTAGT GGAGCGCAGT TGGGCAATCC GACAAGCTTA TCACGAACTG GAAGTTAAGC 1320 

ATCATGATTC CAAGTGGACG GTAGAAGAAG ACCTCTTGGC TTTATCTAAT GATATTGGAA 1380 

ATTTCCAACG ACTGGTGATG ACAAAGCAAG GACGCTACTA TGATGAAACA CCCTACACAC 1440 

TGGAACAAAA ACTTTCAGAA AATATCTGGT GGCTATTAGA ACTTTCTCAA CGTTTGGATA 1500 

TAGACATTCT GACGGAAATG GAAAACTTCC TCTCTGATAA AGAAAAGCAA TTGAACGTTA 1560 

GGACTTGGAA GTAGTCTGCT GATAAAAAAT CAATGCTTAG AAACTATGAA ATAATAAAAA 1620 
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AGGAGAACAT CATGATTAAC ATTACTTTCC CAGATGGCGC TGTTCGTGAA TTCGAATCTG .1680 

GCGTAACAAC TTTTGAAATT GCCCAATCTA TCAGCAATTC CCTAGCTAAA AAAGCCTTGG 1740 

CTGGTAAATT CAACGGCAAA CTCATCGACA CTACTCGCGC TATCACTGAA GATGGAAGCA 1800 

TCGAAATTGT GACACCTGAT CACGAAGATG CCCTTCCAAT CTTGCGTCAC TCAGCAGCTC 1860 

ACTTGTTCGC CCAAGCAGCT CGTCGTCTTT TCCCAGACAT TCACTTGGGA GTTGGTCCAG 1920 

CCATCGAAGA TGGTTTCTAC TACGATACTG ACAACACAGC TGGTCAAATC TCTAACGAAG 1980 

ACCTTCCTCG TATCGAAGAA GAAATGCAAA AAATCGTCAA AGAAAACTTC CCATCTATTC 2040 

GTGAAGAAGT GACTAAAGAC GAGGCACGTG AAATCTTCAA AAATGACCCT TACAAGTTGG 2100 

AATTGATTGA AGAACACTCA GAAGACGAAG GCGGTTTGAC TATCTATCGT CAGGGTGAAT 2160 

ATGTAGACCT CTGCCGTGGA CCTCACGTTC CATCAACAGG TCGTATCCAA ATCTTCCACC 2220 

TTCTCCATGT AGCTGGTGCG TACTGGCGTG GAAACAGCGA CAACGCTATG ATGCAACGTA 2280 

TCTACGGTAC AGCTTGGTTT GACAAGAAAG ACTTGAAAAA CTACCTTCAA ATGCGTGAAG 2340 

AAGCTAAGGA ACGTGACCAC CGTAAACTTG GTAAAGAGCT TGACCTCTTT ATGATTTCAC 2400. 

AAGAAGTGGG ACAAGGTTTG CCATTCTGGT TGCCAAATGG TGCGACTATC CGTCGTGAAT 2460 

TGGAACGCTA CATCGTAAAC AAAGAGTTGG TTTCTGGCTA CCAACACGTC TACACTCCAC 2520 

CACTTGCTTC TGTTGAGCTT TACAAGACTT CTGGTCACTG GGATCATTAC CAAGAAGACA 2580 

TGTTCCCAAC CATGGACATG GGTGACGGGG AAGAATTTGT CCTTCGTCCA ATGAACTGTC 2640 

CGCACCACAT CCAAGTTTTC AAACACCATG TTCACTCTTA CCGTGAATTG CCAATCCGTA 2700 

TCGCTGAAAT CGGTATGATG CACCGTTACG AAAAATCTGG TGCCCTCACT GGCCTTCAAC 2760 

GTGTACGTGA AATGTCACTC AACGACGGTC ACCTATTCGT TACTCCAGAA CAAATCCAAG 2820 

AAGAATTCCA ACGTGCCCTT CAGTTGATTA TCGATGTTTA TGAAGACTTC AACTTGACTG 2880 

ACTACCGCTT CCGCCTCTCT CTTCGTGACC CTCAAGATAC TCATAAGTAC TTTGATAACG 2940 

ATGAGATGTG GGAAAATGCC CAAACCATGC TTCGTGCAGC TCTTGATGAA ATGGGCGTGG 3000 

ACTACTTTGA AGCCGAAGGT GAAGCAGCCT TCTACGGACC AAAATTGGAT ATCCAGATTA 3060 

AAACTGCCCT TGGAAAAGAA GAAACCCTTT CTACTATCCA ACTTGATTTC TTGTTGCCAG 3120 

AACGCTTCGA CCTCAAATAC ATCGGAGCTG ATGGCGAAGA TCACCGTCCA GTCATGATCC 3180 

ACCGTGGGGT TATCTCAACT ATGGAACGCT TCACAGCTAT CTTGATTGAG AACTACAAGG 3240 

GGGCCTTCCC AACATGGCTG GCACCACACC AAGTAACCCT CATCCCAGTA TCTAACGAAA 3300 

AACACGTGGA CTACGCTTGG GAAGTGGCCA AGAAACTCCG TGACCGCGGT GTCCGTGCAG 3360 
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ACGTAGATGA GCGCAATGAA AAAATGCAGT TCAAGATCCG TGCTTCACAA ACCAGCAAGA 3420 

TTCCTTACCA ATTAATTGTT GGAGACAAAG AAATGGAAGA CGAAACAGTC AACGTTCGTC 3480 

GCTACGGCCA AAAAGAAACA CAAACTGTCT CAGTTGATAA TTTTGTTCAA GCTATCCTAG 3540 

CTGATATCGC CAACAAATCA CGCGTTGAGA AATAAGAGTC TAGCATAAAA GCCTCCAATC 3600 

TGGAGGCTTT TTCTCATCTA TTTTTACTCA AGGACTAAGT TCACTTGAGC AAACTGAATC 3660 

CGCACTGTCG - TTCCTTTTCC GACCTCAGAC TCGATACGAA TCTGGTGCCC CAGTTCTTCA 3720 

GAAATTTTCT TAGATAGATA AAGGCCAAGT CCAGAGGACT GCTGGGTCAA ACGGCCATTG 3780 

TATCCTGAAA AGCCACGTTC AAATACTCGG AGGACATCAC TGTTTTTTAT CCCGATTCCC 3840 

GTATCTTTGA TACAAAGCTC TTGGTCATCC ATATAAATCT CCAGACCACC TTCCTTGGTG 3900 

TACTTGAGAC TGTTTGAGAT GATTTGCTCA ATAACCACTA GCAGCCACTT TTTATCCGTC 3960 

ACGATTTCTT TATCAAGGTC ATGTAGATTG ACATTTAAGC CTTTTTGAAT AAAGAAAAGA 4020 

GCATATTTAC GAATTATTTC CTTGACCAAG TCCTCAATTT GAACCTGCTT TAAGACCAAA 4080 

TCATCATGGA AACTTTCTAA ACGCAGGTAC TGTAAAACTA GGTTGGTATA GGAGTCGATT 4140 

TTGAAAATTT CCTGTTCTAG CTGCTGCTTC AGTTGGCGGT CGACCACTTC TGCAACTAAG 4200 

AGTTGACTGG CTGCAATGGG GGTCTTTATC TGATGGACCC ACAAGGTATA GTAATCCAGC 4260 

AAATCCGTCA GTTTTCTTTC TGCTTTTGAC CTCTGCTGAT AGAGTTCCAT CTCACGCGCT 4320 

TCTAATTTTT CTGCTAAAGC TATTTCCAAA GGAGACTTGG CTTCCCTCTC TCCATAGAGA 4380 

AGTTCCTGGC GATAGACCTG CGTTTCCACC AATATGTCCC AAGTGAAAAA TAATATGGTT 4440 

ACAAAGCAAC ACAAGAAGAA AAAGTAGAGG AAGTAAATTC CTAGACTGGC AAATAAAAAC 4500 

TGAAAGAGTA AGACAAGAAA TGCCAAAGAA AGCAGATAGA TAAAAAGACG ACTACGGGAG 4560 

CGCAGATAGG CTAGAAAAAA TTGTTTCCAA TCAAGCATGC TTCAATCCGT ACCCTATTCC 4620 

TTTCTTGGTC TCGATAAATC CTACCAATCC CTGCTCCTCC AACTTTTTAC GCAAACGAGC 4680 

CACATTGACA GAGAGGGTAT TATCATCAAT GAAAAAGTCA CTGTTCCAAA GTTCCCGCAT 4740 

CAGGTCGTCA CGTGCTACGA TGTTGCCTGC ATGCTCAAAT AACACGCGTA AAATCTGGAA 4800 

TTCATTCTTG GTCAAATTCA AGACTTGCCC TTGATAATGT AAATCCATGG ATTTGGTATT 4860 

GAGGATAACA CCAGCATATT CCAGCAAACT CTCATCACGC CCAAACTCAT AGGAACGACG 4920 

CAACAAGCCC TGAACCTTAG CTAAAAGAAC CTGCTGGTCA AAAGGCTTGG TCACAAAGTC 4980 

ATCCGCCCCC ATATTGATTG CCATGACAAT ATCCATAGCC TGGTCTCTCG AAGAAAGAAA 5040 

CATGATAGGT ACCTTGGAAA TCTTGCGGAT TTCCTGACAC CAGTGATAAC CATTAAACAA 5100 

GGGCAAACCA ATATCCATGA GGACCAGATG AGGTTCCGAC TGAACAAATA GACTCAAAAC 5160 
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TTCCATAAAG TCTTCTACCA GGACCACTTC AAATCCCCAT TCAGAGAGCA TTTTCCCAAT 5220 

CTGTTGACGA ATGACCTGAT CATCTTCTAT TAATAAAATC TTGTGCATGC GCTTCTCCTT 5280 

TTCCATTATT ATAACAGATT TTTCCATGCT AGATGGTCTG AAACTGAATT TGAAATAGCC 5340 

TGTTTTTAGC CAGTACAAAC AGGCTATGCT ACTAGCTAAT TTGAGGGAAA TTTGCTAAGA 5400 

TAAATAAAAA GAAAGGAGCT CTTATGGCCA ATATTTTTGA CTATCTGAAA GATGTCGCAT 5460 

ATGATTCTTA TTACGACCTT CCCTTGAATG AGTTAGACAT TCTAACCTTA ATAGAAATCA 5520 

CCTACCTCTC CTTTGATAAT CTGGTCTCCA CACTTCCTCA ACGTCTTTTA GATCTAGCAC 5580 

CTCAGGTTCC AAGAGATCCC ACCATGCTTA CTAGCAAAAA TCGCCTTCAA TTATTAGATG 5640 

AATTGGCTCA ACACAAGCGC TTCAAAAATT GCAAACTCTC CCATTTTATC AACGACATCG 5700 

ACCCTGAACT GCAAAAGCAA TTTGCGGCTA TGACTTATCG TGTCAGCCTC GATACCTATC 5760 

TGATTGTCTT TCGTGGGACA GATGACAGTA TCATTGGCTG GAAGGAAGAT TTCCACCTGA 5820 

CCTATATGAA GGAAATTCCT GCTCAAAAGC ACGCCCTTCG CTATTTAAAG AACTTTTTTG 5880 

CCCATCATCC TAAGCAAAAG GTTATTCTAG CTGGGCATTC CAAGGGAGGA AATCTCGCTA 5940 

TCTATGCTGC TAGCCAAATT GAGCAAAGTT TGCAAAATCA GATCACAGCA GTTTATACAT 6000 

TTGATGCACC TGGTCTCCAT CAAGAATTGA CACAGACTGC GGGTTATCAA AGGATAATGG 6060 

ATAGAAGCAA GATATTCATT CCACAAGGTT CCATTATCGG TATGATGCTG GAAATTCCTG 6120 

CTCACCAAAT CATCGTTCAG AGTACTGCCC TGGGTGGCAT CGCCCAGCAC GATACCTTTA 6180 

GTTGGCAGAT TGAGGACAAG CACTTCGTCC AACTGGATAA GACCAACAGT GATAGCCAGC 6240 

AAGTAGACAC AACCTTTAAA GAATGGGTGG CCACAGTCCC TGACGAAGAA CTTCAGCTCT 6300 

ACTTCGACCT CTTCTTTGGC ACTATTCTTG ATGCTGGTAT TAGCTCTATC AATGACTTGG 6360 

CTTCCTTAAA GGCGCTTGAA T AC ATT CATC ATCTCTTTGT CCAAGCTCAA TCCCTCACTC 6420 

CAGAAGAAAG AGAAACCTTG GGTCGCCTTA CCCAGTTATT GATTGATACT CGTTACCAGG 6480 

CATGGAAAAA TAGATAATAC TCTTGAAAAT TAAATGTATA CAAAACAAAA GACCTAGAAT 6540 

ACATACTTTC ATGTGCATTC TAAGTCTTTT TAAATAGAAT CTAATAGTCA ATAAAAATCA 6600 

AAGAGCATTG AGAGATAATG GGGCTTGGAA CGTCCCTCTC GCTTCAACAA AATGACCCCA 6660 

TTATAGATTA AAAAGATGCC ACTTAGAAAA AGCAAAAAAG GAAGTAAGAC AAAGGCAAAT 6720 

ATATAAAAAG CTAACTGAAC ATTCTCGTAT CCATTTTTAT AAAAAAGGTA GGATAGATAA 6780 

AAATAACTTG AAATGAGGGA TAATAAAAAT AATACTGGAT TCCACAAACT TCTATTATCC 6840 

TTCCAAAATG ACACTATAAA GGCTAATACA ATTCCTATAA CGAGATACAT TTCTTACTCC 6900 
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TTTAATAGCT ACATTTTATC ATAATTATCC AAAGAAAAAA GAGGGCATTT ATCCCTCTTA 6960 

ATCCTTCATC TGACTCTCTG CATCGGCCAC GACTTTTTCT AGACTGGTTT GACCAAGTTC 7020 

TGCCTCCATA GTCAACTGAA TTCTCTCCAA TTTTTGATCC AAAACATCAT GAATATGAGC 7080 

TCCTACAGGG CAATTTGGAT TCGGATTGTC ATGGAAACTG AAGAGTTGAC CTGTCTTACC 7140 

AAGACATTCG ACCGCCTGAT AAACATCTAA AAGACTAATA TCCTTAAGGT CCTTGACAAT 7200 

CTCTGTTCCG CCCGTTCCAC GCGCTACTGA AATCAGCTCT GCCTTCTTCA ACTGGGACAA 7260 

GATCTTTCTG ATAATGACAG GATTGACCCC GACACTAGCA GCCAGAAAAT CACTGGTCAC 7320 

CTTGCTTTCC TTCCCCTCGA GGGCAATGAT TATCAGCATA . TGAGTCGCAA TGGTAAATCT 7380 

ACTTGGAATT TGCATCCTCT TCTCCTTTTT ACGAGGCTAC CCTGCCTCTA CTCTTCTTTT 7440 

TCTATTATTA TACCCTTTTT AGTTGTAATG TCAATCGTTA CCACTTTTCA ACCAGTCGTC 7500 

TAACTCCCGA TCGCAGCCCT CTTTCTGAGC CAATTCTCTC AAAAATTCCT GATGATGAGT 7560 

ATGGTGGATC CCATTGACCA GACTTTCATA GTAAACCTCA AAATAGGGAA GTCTCAGGTC 7620 

TTTAGCCAGC TGCAATTCAG CTGCTACATC GTAGTCTACC CGTCGGAAGT CCATATCTAC 7680 

CAGGCCTTTG TCATCAAACT CCAAAATCAT ATACTGGGCC CGCAAGTCCT TCCGTAGCTG 7740 

AGCGTCCAAA AAGAAAGGTT GGCCAATCGA ACCCGGATTG ACAATCAATT GCCCACCAGT 7800 

CCCGTAACGA AGCAACTGCT GGTGAATATG TCCATAAACA GCAATATCAC AGGGAGGATG 7860 

AGTCACCAAG CGGTCAAACT CCTCTTGTTT GCCAGTATGA ATCAACTCTC GCCCCCAGTT 7920 

CTTATCAGGC AGATGATGGC TAATTCCCAC CGTCAAATCC CCAAACTGAC GATGAATTTG 7980 

AAGAGGTTGA TTGTGGAGCA CTTCAATTTC TTCTAGGGAA ATTTCCTCTA AAACATACTG 8040 

GCACTGGCGC AAGAGATAGC GTTGACTGGG GCGAGTACTG TCCAATTCCT TACGGACACC 8100 

ATGCCAAAGA CTGTCTTCCC AGTTTCCCAA AACTCTAGCC GTAATCGGTA GTTGATCCAA 8160 

CAAGTCCAAA ATCCTTCTAC GCCCTGTCCC TGGCATGAGA ATATCTCCCA AAAGCCAGTA 8220 

TTCATCCACT CCTATCTGCC GAGCATCTGC CAAAACAGCC TCCAAGGCGG TGGTATTTCC 8280 

ATGAATATCT GAAAGAAGAG CTATTTTCGT CATATCCATC TCCTCGTTTT TTCTCTTGCA 3340 

ATAAGTATAA CATAAAAAGT CACAGCTAGA GAAATCTAGC TTTTTTTGAT ATACTAGATA 8400 

AAGATATTAG ACAAGAGGAA ACGAATGACC CCAAACAAAG AAGACTATCT AAAATGTATT 8460 

TATGAAATTG GCATAGACCT GCATAAGATT ACCAACAAGG AAATTGCGGC TCGCATGCAA 8520 

GTCTCTCCCC CTGCCGTAAC TGAAATGATC AAACGAATGA AAAGTGAAAA TCTCATCCTA 8580 

AAGGACAAGG AATGTGGCTA TCTACTGACT GACCTCGGTC TCAAACTGGT CTCTGAGCTC 8640 

TATCGTAAGC ACCGCTTGAT TGAAGTTTTT CTAGTTCATC ATTTAGACTA TACAAGTGAC 8700 
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CAGATTCACG AGGAAGCTGA GGTCTTGGAA CACACTGTCT CTGACCTGTT CGTGGAAAGA 8760 

CTAGATAAAC TGCTAGGTTT CCCTAAAACC TGCCCCCACG GGGGAACTAT TCCTGCCAAG 8820 

GGAGAACTAC TCGTTGAAAT CAATAACCTC CCACTAGCTG ATATCAAGGA AGCTGGCGCC 8880 

TACCGCCTGA CTCGGGTGCA CGATAGTTTT GACATTCTCC ATTATCTGGA CAAGCACTCA 8940 

CTTCACATCG GTGACCAGCT CCAAGTCAAG CAGTTTGATG GCTTCAGCAA TACCTTCACT 9000 

ATCCTCAGTA ACGACGAGGA TTTACAAGTG AATATGGACA TTGCAAAACA ACTCTATGTC 9060 

GAGAAAATCA ACTAATTTCT CAAGTCCCCT ACCAACCCTG AAAGTTTTAT TTTGGCTCTT 9120 

TGTCAACTGT AGTGGGTTGA AGTCAGCTAA GCTCGAGAAA GGACAAATTT TGTCCTTTCT 9180 

TTTTTGATAT TCAGAGCGAT AAAAATCCGT TTTTTGAAGT TTTCAAAGTT CCGAAAACCA 9240 

AAGGCATTGC GCTTGATAAG TTTGATGAGA TTATTGGTCG CTTCCAGTTT GGCATTAGAA 9300 

TAGTGTAGTT GAAGGGCGTT GACAATCTTT TCTTTATCTT TGAGGAAGGT TTTAAAGACA 9360 

GTCTGAAAAA TAGGATGAAC CTGCTTTAGA TTGTCCTCAA TGAGTCCGAA AAATTTCTCC 9420 

GGTTTCTTAT TCTGAAAGTG AAACAGCAAG AGTTGATAGA GCTGATAGTG GTGTTTCAAG 9480 

TCTTGTGAAT AGCTCAAAAG CTTGTCTAAA ATCTCTTTAT TGGTTAAGTG CATACGAAAA 9540 

GTAGGACGAT AAAATCGCTT ATCACTCAGT TTACGGCTAT CCTGTTGTAT GAGCTTCCAG 9600 

TAGCGCTTGA TAGCCTTGTA TTCATGGGAT TTTCGATCCA ATTGGTTCAT AATTTGAACA 9660 - 

CGCACACGAC TCATAGCACG GCTAAGATGT TGTACAATGT GAAAGCGATC CAACACGATT 9720 

TTAGCATTCG GGAGTGAAAC AGTCTGGGAG ACTGTTTCAG CCTGAGCCTA GAAATTTGAA 9780 

AGCGAAGCTG TTTAGCCAAG TCATAGTAAG GACTAAACAT ATCCATCGTA ATGATTTTCA 9840 

CTTGACAACG AACGGCTCTA TCGTAGCGAA GAAAGTGATT TCGGATGACA GCTTGTGTTC 9900 

TGCCTTCAAG AACAGTGATA ATATTAAGAT TATCAAAATC TTGCGCAATG AAACTCATCT 9960 

TTCCCTTAGT GAAGGCATAC TCATCCCAAG ACATAATCTT TGGAAGCCGA GAAAAATCAT 10020 

GCTCAAAGTG AAAGTCATTG AGCTTGCGAA TGACAGTTGA AGTTGAAATG GCCAGCTGAT 10080 

GGGCAATATC AGTCATAGAA ATTTTTTCAA TTAACTTTTG AGCAATyTTT TGGTTGATGA 10140 

TACGAGGGAT TTGGTGATTT TTCTTTACCA GGGGAGTCTC AGCAACCATC ATTTTTGAAC 10200 

AGTGATAGCA CTTGAAACGA CGCTTTCTAA GGAGAATTCT AGAAGGCATA CCAGTCGTTT 10260 

CAAGATAAGG AATTTTAGAA GGTTTTTGAA AGTCATATTT CTTCAATTGG TTTCCGCACT 10320 

CAGGGCAAGA TGGGGCGTCG TAGTCCAGTT TGGCGATGAT TTCCTTGTGT GTATCCTTAT 10380 

TGATGATGTC TAAAATCTGG ATATTAGGGT CTTTAATGTC TAGTAATTTT GTGATAAAAT 10440 
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GTAATTGTTC CATATGATTC TTTCTAATGA GTTGTTTTGT CGCTTTTCAT TATAGGTCAT 10500 

ATGGGACTTT TTTTCTACAA TAAAATAGGC TCCATAATAT CTATAGTGGA TTTACCCACT 10560 

ACAAATATTA TAGAACCGTA AAAATAGAAG GAGATAGCAG GTTTTCAAGC CTGCTATCTT 10620 

TTTTTGATGA CATTCAGGCT GATACGAAAT CATAAGAGGT CTGAAACTAC TTTCAGAGTA 10680 

GTCTGTTCTA TAAAATATAG TAGATTGAAA TAAGATGTGA ACAACTCTAT CAGGAAAGTC 10740 

AAATTAATTT ATAGAATTAT TTTAGCAGTC AAGGTGTACT GTTATAGATT CAATATATTA 10800 

TATGACTATT AACCTTGTCT TCTCCTAAAA TTGACTTTCT TGTTTTCTTA TCTTGTCCAC 10860 

TCGAAACAAG TATTGTAAGA ATTTGATTAT TTTTGAAAGT ACTTTTAATA TACTTGATAT 10920 

AGTTAAAAAA GATTTGAAAC TAAATTCCAA ATTAGAAAAA GACTTGAAAT ACTAAAAAAA 10980 

AAAAAGTATA CTCTAATTGA AAACGGTAAC AAAACTAATT TAGAGAATGA AATATAGAGT 11040 

ATTTCTCTCT TAAAAGTTTT TGGTGAAACG AGATGTAGAA AGGAGATTTA GCCAAAGAGT 11100 

CTATTAGTGC TAGAATAATA GATTAGAATT ATTTTAGAAA AACGAAGTGA GCAGCTTATA 11160 

AATTCAAGTC CCCAAATAGA TTCATACTAG TATCTTTTGC AAAAAATAAA GGGCGACTTC 11220 

CTTCATGAAT ATCAATTTCA TCTATAAGGA AGGTAGCTAA TTGAACTAAC TTATTTATTC 11280 

TGTTTGTCGC TAGAAAAATC AGACCTCCTT GTGAAGATTG AGGAGATACT TAATGAAAAT 11340 

CAAAGAAGAA ACTAGCAAGC TAGTAGCAGA TTGCCCAAAA CACCGCTTTG AGGTTGTAGA 11400 

TAAGACTGAC CTATATAATC CAAGGTGAAG CGACTGTGGT TTGAAGAGAT TTTCAAAGAG 11460 

TATAGGCTAG AGAGTAGTGT TTTTATGTCC TTCTAGTAGA AAATGCTAGA CAGAAGAATG 11520 

GGGAACTTGG ATAGGAAAAA TAGATTGAGA AAGGAGGTTA GAAGAGATGA TTATTACAAA 11580 

AATTAGCCGT TTAGGAACTT ATGTGGGAGT AAATCCACAT TTTGCAACAT TAATAGATTT 11640 

TCTAGAAAAA ACAGGACTAG AAAATTTAAC AGAAGGTTCG ATTGCTATCG ATGGTAATCG 11700 

ATTGTTTGGG AATTGCTTTA CTTATCTAGC AGATGGTCAA GCAGGGGCTT TCTTTGAAAC 11760 

CCACCAAAAA TATTTGGATA TTCATTTAGT TTTGGAAAAC GAAGAAGCCA TGGCTGTTAC 11820 

ATCGCCGGAA AATGTAAGCG TTACCCAAGA ATATGATGAA GAGAAAGATA TTGAATTATA 11880 

CACAGGGAAA GTGGAACAGT TGGTTCATTT GAGAGCTGGC GAATGCCTCA TCACTTTTCC 11940 

AGAAGATTTA CATCAACCCA AGGTTCGTAT AAATGATGAA CCTGTGAAAA AAGTTGTCTT 12000 

TAAAGTTGCG ATTTCTTAAT GTAGAAAGAG AAGAACGATG AAAAAAATGA GAAAGTTTTT 12060 

ATGTCTAGCT GGAATTGCGC TAGCGGCTGT TGCCTTGGTA GCTTGTTCAG GAAAAAAAGA 12120 

AGCTACAACT AGTACTGAAC CACCAACAGA ATTATCTGGT GAGATTACAA TGTGGCACTC 12180 

CTTTACTCAA GGACCCCGTT TAGAAAGTAT TCAAAAATCA GCAGATGCTT TCATGCAAAA 12240 
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GCATCCAAAA ACGAAAATCA AGATTGAAAC ATTTTCTTGG AATGACTTCT ATACTAAATG 12300 

GACTACAGGT TTAGCAAATG GAAATGTGCC AGATATCAGT ACAGCTCTTC CTAACCAAGT 12360 

AATGGAAATG GTCAACTCAG ATGCTTTGGT TCCGCTAAAT GATTCTATCA AGCGTATTGG 12420 

ACAAGATAAA TTTAACGAAA CTGCCTTAAA TGAAGCAAAA ATCGGAGATG ATTACTACTC 12480 

TGTTCCTCTT TATTCACATG CACAAGTCAT GTGGGTTAGA ACAGATTTGT TAAAAGAACA 12540 

TAATATTGAG GTTCCTAAAA CTTGGGATCA ACTCTATGAA GCTTCTAAAA AATTGAAAGA 12600 

AGCTGGAGTT TATGGCTTGT CTGTTCCGTT TGGAACAAAT GACTTAATGG CAACACGTTT 12660 

CTTGAACTTC TACGTACGTA rTGGTGGAGG AAGCCTCTTA ACAAAAGATC TTAAAGCAGA 12720 

CTTGACAAGC CAACTTGCTC AAGATGGTAT TAAATACTGG GTTAAATTGT ATAAAGAAAT 12780 

v CTCACCTCAA GATTCTTTGA ACTTTAATGT CCTTCAACAA GCTACCTTGT TCTATCAAGG 12840 

AAAAACAGCA TTTGACTTTA ACTCTGGCTT CCATATCGGA GGAATTAATG CCAACAGTCC 12900 

TCAATTGATT GATTCGATTG ATGCTTATCC TATTCCAAAA ATCAAAGAGT CTGATAAAGA 12960 

CCAAGGAATT GAAACCTCAA ACATTCCAAT GGTTGTTTGG AAAAATTCAA AACATCCAGA 13020 

AGTTGCTAAA GCATTCTTAG AAGCACTTTA TAATGAAGAA GACTACGTTA AATTCCTTGA 13080 

TTCAACTCCA GTAGGTATGT TGCCAACTAT TAAGGGGATT AGCGATTCTG CAGCCTATAA 13140 

AGAAAATGAA ACTCGTAAGA AATTTAAACA TGCTGAAGAA GTAATTACTG AAGCTGTTAA 13200 

AAAAGGTACT GCTATTGGTT ATGAAAATGG GCCAAGTGTA CAAGCTGGTA TGTTGACTAA 13260 

CCAACACATT ATTGAACAAA TGTTCCAAGA TATCATTACA AATGGAACAG ATCCTATGAA 13320 

AGCAGCAAAA GAAGCAGAAA AACAATTAAA TGATTTATTT GAGGCTGTTC AGTAGATGTA 13380 

AAAGACTAGA AAATAGGTGG GATAGTGAGC TGAAAAGCTC TAGCCCAATC TTGTAAAAGA 13440 

AGGGAGAAGG AGAATGGTTA AAGAACGTAA TTTAACTCGC TGGATATTTG TTTTGCCAGC 13500 

TATGATTATC GTAGGATTAC TCTTTGTTTA TCCGTTTTTC TCGAGTATTT TTTATAGCTT 13560 

TACCAATAAG CATTTGATTA TGCCTAATTA TAAATTTGTT GGTTTGGCTA ACTATAAAGC 13620 

TGTGCTATCA GATCCCAACT TCTTTAATGC GTTCTTTAAT TCAATTAAGT GGACCGTTTT 13680 

CTCATTAGTT GGTCAAGTTT TAGTAGGGTT TGTATTGGCT TTAGCTCTTC ACAGAGTACG 13740 

CCACTTCAAG AAATTATATA GGACATTATT GATTGTTCCT TGGGCATTTC CTACCATCGT 13800 

TATTGCCTTC TCTTGGCAGT GGATTCTAAA CGGGGTTTAT GGCTACTTAC CTAATCTAAT 13860 

CGTAAAATTA GGTTTAATGG AACATACACC TGCATTTTTG ACAGATAGTA CATGGGCATT 13920 

CCTATGTTTG GTGTTTATCA ACATTTGGTT TGGAGCACCA ATGATTATGG TTAATGTGCT 13980 
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TTCAGCTTTG CAAACAGTAC CAGAAGAACA ATTTGAGGCT GCTAAGATAG ATGGTGCTTC 14040 

AAGTTGGCAG GTGTTCAAGT TTATCGTCTT TCCACATATT AAAGTGGTTG TAGGACTTCT 14100 

AGTTGTTTTG AGAACTGTAT GGATCTTTAA TAACTTTGAC ATTATCTACC TCATTACTGG 14160 

TGGTGGACCA GCCAATGCTA CAACGACGCT TCCAATTTTT GCTTACAACC TGGGCTGGGG 14220 

AACTAAATTG TTGGGTCGTG CTTCAGCAGT TACAGTACTG CTCTTTATCT TCTTGGTGGC 14280 

GATTTGCTTT ATCTACTTTG CTATCATCAG TAAGTGGGAA AAGGAGGGTA GAAAATAATG 14340 

AAGAAGAAAT CCAGTATTTA TTTAGATATT CTCTCACATG TACTTTTAGT TGGTGCGACC 14400 

ATCGTTGCAG TTTTCCCATT GGTATGGATT ATCATATCTT CTGTCAAAGG GAAAGGGGAA 14460 

TTAACTCAGT ATCCAACACG ATTTTGGCCT GAACAGTTTA CATTAGATTA TTTCACTCAT 14520 

GTTATCAACG ATTTGCACTT CATTGATAAC ATTCGAAACA GTTTAATCAT TGCCTTGGCT 14580 

ACAACCCTTA TTGCGATTAT TATTTCTGCT ATGGCAGCCT ATGGTATTGT TCGATTCTTT 14640 

CCTAAATTGG GAGCAATCAT GTCGAGACTA CTCGTCATTA CCTACATTTT CCCACCAATT 14700 

TTGTTAGCAA TTCCCTATTC AATTGCCATT GCTAAAGTTG GGTTAACAAA TAGTTTATTT 14760 

GGCTTGATGA TGGTTTATCT ATCTTTTAGT GTTCCATATG CAGTTTGGCT CTTAGTTGGA 14820 

TTTTTCCAAA CAGTTCCAAT TGGAATTGAA GAAGCGGCTA GAATTGATGG TGCAAATAAA 14880 

TTTGTTACGT TTTATAAAGT TGTGCTACCG ATTGTAGCAC CAGGTATTGT AGCAACAGCT 14940 

ATTTATACAT TTATCAATGC TTGGAATGAA TTCCTGTATG CCTTGATTTT GATTAACAAT 15000 

ACAGGAAAGA TGACAGTAGC AGTAGCCCTT CGTTCACTTA ATGGTTCAGA AATACTAGAC 15060 

TGGGGAGATA TGATGGCAGC GTCTGTTATT GTAGTTCTTC CATCAATTAT TTTCTTCTCT 15120 

ATCATCCAAA ATAAGATTGC AAGTGGATTA TCAGAAGGAT CTGTGAAGTA GACGAAAGAA 15180 

GGAAAAAAAT GAATAAAAGA GGTCTTTATT CAAAACTAGG AATTTCCGTT GTAGGCATTA 15240 

GTCTTTTAAT GGGAGTCCCC ACTTTGATTC ATGCGAATGA ATTAAACTAT GGTCAACTGT 15300 

CCATATCTCC TATTTTTCAA GGAGGTTCAT ATCAACTGAA CAATAAGAGT ATAGATATCA 15360 

GCTCTTTGTT ATTAG AT AAA TTGTCTGGAG AGAGTCAGAC AGTAGTAATG AAATTTAAAG J 5420 

CAGATAAACC AAACTCTCTT CAAGCTTTGT TTGGCCTATC TAATAGTAAA GCAGGCTTTA 15480 

AAAATAATTA CTTTTCAATT TTCATGAGAG ATTCTGGTGA GATAGGTGTA GAAATAAGAG 15540 

ACGCCCAAAA GGGAATAAAT TATTTATTTT CCAGACCAGC TTCATTATGG GGAAAACATA 15600 

AAGGACAGGC AGTTGAAAAT ACACTAGTAT TTGTATCTGA TTCTAAAGAT AAAACATACA 15660 

CAATGTATGT TAATGGAATA GAAGTGTTCT CTGAAACAGT TGATACATTT TTGCCAATTT 15720 

CAAATATAAA TGGTATAGAT AAGGCAACAC TAGGAGCTGT TAATCGTGAA GGTAAGGAAC 15780 



WO 98/18931 



PCT/US97/19588 



167 

ATTACCTCGC AAAAGGAAGT ATTGATGAAA TCAGTCTATT TAACAAAGCA ATTAGTGATC 15840 

AGGAAGTTTC AACTATTCCC TTGTCAAATC CATTTCAGTT AATTTTCCAA TCAGGAGATT 15900 

CTACTCAAGC TAACTATTTT AGAATACCGA CACTATATAC ATTAAGTAGT GGAAGAGTTC 15960 

TATCAAGTAT TGATGCACGT TATGGTGGGA CTCATGATTC TAAAAGTAAG ATTAATATTG 16020 

CCACTTCTTA TAGTGATGAT AATGGGAAAA CGTGGAGTGA GCCAATTTTT GCTATGAAGT 16080 

TTAATGACTA TGAGGAGCAG TTAGTTTACT GGCCACGAGA TAATAAATTA AAGAATAGTC 16140 

AAATTAGTGG AAGTGCTTCA TTCATAGATT CATCCATTGT TGAAGATAAA AAATCTGGGA 16200 

AAACGATATT ACTAGCTGAT GTTATGCCTG CGGGTATTGG AAATAATAAT GCAAATAAAG 16260 

CCGACTCAGG TTTTAAAGAA ATAAATGGTC ATTATTATTT AAAACTAAAG AAGAATGGAG 16320 

ATAACGATTT CCGTTATACA GTTAGAGAAA ATGGTGTCGT TTATAATGAA ACAACTAATA 16380 

AACCTACAAA TT AT ACT ATA AATGATAAGT ATGAAGTTTT GGAGGGAGGA AAGTCTTTAA 16440 

CAGTCGAACA ATATTCGGTT GATTTTGATA GTGGCTCTTT AAGAGAAAGG CATAATGGAA 16500 

AACAGGTTCC TATGAATGTT TTCTACAAAG ATTCGTTATT TAAAGTGACT CCTACTAATT 16560 

ATATAGCAAT GACAACTAGT CAGAATAGAG GAGAGAGTTG GGAACAATTT AAGTTGTTGC 16620 

CTCCGTTCrr AGGAGAAAAA CATAATGGAA CTTACTTATG TCCCGGACAA GGTTTAGCAT 16680 

TAAAATCAAG TAACAGATTG ATTTTTGCAA CATATACTAG TGGAGAACTA ACCTATCTCA 16740 

TTTCTGATGA TAGTGGTCAA ACATGGAAGA AATCCTCAGC TTCAATTCCG TTTAAAAATG 16800 

CAACAGCAGA AGCACAAATG GTTGAACTGA GAGATGGTGT GATTAGAACA TTCTTTAGAA 16860 

CCACTACAGG TAAGATAGCT TATATGACTA GTAGAGATTC TGGAGAAACA TGGTCGAAAG 16920 

TTTCGTATAT TGATGGAATC CAACAAACTT CATATGGCAC ACAAGTATCT GCAATTAAAT 16980 

ACTCTCAATT AATTGATGGA AAAGAAGCAG TCATTTTGAG TACACCAAAT TCTAGAAGTG 17040 

GCCGCAAGGG AGGCCAATTA GTTGTCGGTT TAGTCAATAA AGAAGATGAT AGTATTGATT 17100 

GGAAATACCA CTATGATATT GATTTGCCTT CGTATGGTTA TGCCTATTCT GCGATTACAG 17160 

AATTGCCAAA TCATCACATA GGTGTACTGT TTGAAAAATA TGATTCGTGG TCGAGAAATG 17220 

AATTGCATTT AAGCAATGTA GTTCAGTATA TAGATTTGGA AATTAATGAT TTAACAAAAT 17280 

AAAGGAGAAA AACATGGTTA AATACGGTGT TGTTGGAACA GGGTATTTTG GAGCTGAATT 17340 

GGCTCGCTAC ATGCAAAAGA ATGATGGAGC AGAGATTACT CTTCTCTATG ATCCAGATAA 17400 

TGCAGAGGCG ATTGCAGAAG AATTGGGAGC AAAAGTAGCA AGTTCCTTAG ATGAGTTGGT 17460 

TTCTAGCGAT GAAGTAGATT GTGTTATCGT CGCAACTCCA AATAATCTTC ATAAGGAACC 17S20 
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GGTTATTAAG 


GCTGCACAGC 


ATGGTAAAAA 


TGTTTTCTGT 


GAAAAACCAA 


TTGCGCTTTC 


17580 


TTATCAAGAT 


TGTCGCGAGA 


TGGTAGATGC 


GTGTAAAGAA 


AACAATGTAA 


CCTTTATGGC 


17640 


AGGACATATT 


ATGAATTTCT 


TTAATGGTGT 


TCATCATGCA AAAGAACTCA 


TTAATCAAGG 


17700 


AGTTATCGGA 


GACGTTCTAT 


ATTGTCATAC 


AGCTCGTAAT 


GGTTGGGAAG 


AACAACAACC 


17760 


GTCAGTATCA 


TGGAAAAAAA 


TTCGTGAAAA 


ATCAGGTGGT 


CACTTGTATC 


ACCACATCCA 


17820 


TGAATTGGAT 


TGCGTTCAAT 


TCCTTATGGG 


GGGCATGCCT 


GAAACTGTAA 


CCATGACAGG 


17880 


TGGAAATGTG 


GCCCATGAAG 


GTGAACATTT 


CGGTGATGAA 


GATGATATGA 


TTTTTGTCAA 


17940 


TATGGAATTT 


TCTAATAAGC 


GTTTTGCCTT 


GTTAGAATGG 


GGTTCAGCTT 


ATCGTTGGGG 


18000 


TGAACATTAT 


GTCTTAATCC 


AAGGAAGCAA 


AGGTGCCATC 


CGCTTAGACT 


TATTCAACTG 


18060 


TAAAGGAACT 


CTTAAGCTAG 


ATGGGCAAGA 


AAGCTATTTC 


TTGATTCACG 


AATCGCAAGA 


18120 


AGAAGATGAT 


GATCGGACTC 


GTATCTATCA 


TAGTACAGAG 


ATGGATGGAG 


CAATTGCTTA 


18180 


TGGTAAACCA 


GGTAAACGTA 


CTCCATTATG 


GCTATCATCT 


GTCATTGATA 


AAGAAATGCG 


18240 


CTATCTGCAT 


GAGATTATGG 


AAGGAGCTCC 


AGTATCAGAA 


GAATTTGCAA 


AACTTTTGAC 


18300 


AGGTGAAGCT 


GCCCTAGAAG 


CAATTGCTAC 


TGCAGATGCT 


TGTACCCAGT 


CTATGTTTGA 


18360 


AGATCGCAAA 


GTAAAATTGT 


CAGAAATTGT 


AAAATAAATT 


TTGGTATTCT 


CCTATTTATA 


18420 


GGTCGACTTG 


CTCCTCTGAA 


AGTACTTTTA 


GAGGAGCTGT 


TTGACTTTGC 


TAGTTTTTGA 


18480 


AACTGAAATC 


TATTATACTA 


CAAACTATTG 


AAAGCGTTTT 


AATTTTAAGG 


TATAATAATC 


18540 


TCATAGAAAT 


AAAGAAAAGG 


AGGAAAGAGG 


ATGCCACAGA 


TTAGCAAAGA 


AGCCTTGATT 


18600 


GAGCAAATCA AAGATGGAAT 


CATCGTTTCT 


TGTCAGGCTC 


TTCCTCATGA 


ACCGCTTTAT 


18660 


ACAGAAGCGG 


GAGGGGTGAT 


TCCCTTGCTG 


GTCAAAGCGG 


CTGAGCAAGG 


TGGAGCAGTC 


18720 


GGTATCCGAG 


CAAACAGTGT 


TCGCGATATC 


AAGGAAATTA 


AGGAAGTCAC 


TAAACTTCCA 


18780 


ATCATTGGGA 


TTATCAAACG 


TGATTATCCA 


CCTCAGGAAC 


CCTTCATCAC 


GGCTACTATG 


1 QQA ft 


AAAGAAGTTG 


ATGAATTGGC 


AGAACTGGAC 


ATCGAGGTGA 


TTGCTCTGGA 


TTGTACCAAG 


18900 


CGTGAACGCT ACGATGGTTT GGAAATTCAA GAGTTCATTC 


GTCAGGTTAA 


GGAGAAATAT - 


18960- 


CCTAATCAGC TTTTGATGGC TGATACTAGT ATCTTCGAAG AAGGGCTAGC AGCTGTAGAA 


19020 


GCAGGAATTG ACTTTGTCGG AACAACCTTA TCAGGCTACA CATCCTACAG TCCAAAAGTA 


19OB0 


GACGGTCCAG 


ATTTTGAATT 


GATTAAGAAA 


CTCTGTGATG 


CTGGTGTAGA 


TGTCATTGCA 


19140 


GAAGGAAAAA 


TTCATACACC 


AGAACAAGCC 


AAACAAATCC 


TTGAATATGG 


AGTGCGAGGC 


19200 


ATCGTTGTTG 


GTGGCGCCAT 


TACTAGACCA 


AAAGAGATTA 


CAGAACGCTT 


CGTTGCTAGT 


19260 


CTTAAATAAG ATGTGAGGGG GAGTTTTATG TTTAAAGTTT TACAAAAAGT TGGAAAAGCT 


19320 
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TTTATGTTAC CTATAGCTAT ACTTCCTGCA GCAGGTCTAC TTTTGGGGAT TGGTGGTGCA 19380 

CTTTCAAACC CAACCACGAT AGCAACTTAT CCAATACTAG ACAATAGTAT TTTTCAATCA 19440 

ATATTCCAAG TAATGAGCTC TGCAGGAGAG GTTGTATTCA GTAATTTGTC ACTACTTCTC 19500 

TGTGTGGGAT TATGTATTGG CTTAGCGAAA CGAGATAAAG GAACCGCTGC GTTAGCAGGA 19560 

GTAACTGGTT ACTTAGTTAT GACTGCAACG ATCAAAGCTT TGGTAAAACT TTTTATGGCA 19620 

GAAGGATCTG CAATTGATAC TGGAGTTATT GGAGCATTAG TTGTCGGAAT AGTTGCCGTA 19680 

TATTTGCACA ACCGATATAA CAATATTCAA TTACCTTCCG CTTTAGGATT CTTTGGAGGT 19740 

TCACGCTTCG TTCCTATTGT TACATCGTTC TCTTCTATCT TGATTGGCTT TGTCTTCTTT 19800 

GTTATTTGGC CACCTTTCCA ACAACTTCTT GTTTCTACAG GTGGATATAT TTCTCAGGCG 19860 

GGTCCAATTG GAACTTTTCT ATATGGATTT TTAATGAGAC TTTCTGGAGC AGTAGGCTTA 19920 

CATCATATAA TTTACCCTAT GTTTTGGTAT ACTGAACTTG GTGGTGTTGA AACTGTTGCA 19980 

GGACAAACAG TGGTTGGAGC TCAAAAAATA TTTTTTGCTC AATTAGCCGA TTTGGCCCAT 20040 

TCTGGATTAT TTACAGAAGG AACAAGGTTT TTTGCAGGTC GTTTCTCAAC AATGATGTTC 20100 

GGTTTACCGG CTGCCTGTTT AGCGATGTAC CATAGTGTTC CTAAAAATCG TCGTAAAAAA 20160 

TACGCGGGTT TGTTTTTTGG AGTTGCTTTA ACATCTTTTA TTACCGGTAT TACAGAACCA 20220 

ATTGAATTTA TGTTTCTATT CGTCAGTCCG GTTCTATATG TTGTTCACGC ATTCCTTGAT 20280 

GGTGTTAGCT TCTTTATTGC AGACGTCTTA AATATTTCAA TAGGAAACAC ATTTTCAGGA 20340 

GGTGTAATCG ATTTCACTTT ATTTGGAATT TTGCAGGGGA ACGCTAAGAC GAATTGGGTT 20400 

CTTCAGATTC CATTTGGACT TATTTGGAGT GTTTTGTATT ATATTATTTT TAGATGGTTC 20460 

ATTACTCAAT TCAACGTTCT AACGCCAGGG CGAGGAGAAG AAGTAGATTC TAAAGAAATT 20520 

TCTGAATCCG CAGATTCAAC TTCAAATACT GCAGATTATT TAAAACAGGA TAGCCTACAA 20580 

ATTATCAGAG CCTTGGGTGG ATCAAATAAT ATAGAAGATG TAGATGCTTG TGTGACACGT 20640 

TTACGTGTAG CTGTAAAAGA AGTTAATCAA GTTGATAAAG CACTTTTAAA ACAAATTGGT 20700 

GCAGTTGATG TCTTAGAAGT GAAGGGTGGC ATTCAAGCAA TCTATGGAGC AAAAGCAATC 20760 

TTATATAAAA ATAGTATTAA TGAAATTTTA GGTGTAGATG ATTAAGTACT TACTGACTTA 20820 

ATAAAAAACA GAGGAGAGTG ATGGATGAGT AGGATGAAAT GAAATCGCAT ACAAGAAATA 20880 

AAGAACTCAT TATCCAAGTT GGATACGCTT ATTACATAGG AGAATACAAA TGAAATTTAG 20940 

AAAATTAGCT TGTACAGTAC TTGCGGGTGC TGCGGTTCTT GGTCTTGCTG CTTGTGGCAA 21000 

TTCTGGCGGA AGTAAAGATG 1 CTGCCAAATC AGGTGGTGAC GGTGCCAAAA CAGAAATCAC 21060 



WO 98/18931 



PCTVUS97/19588 



170 



TTGGTGGGCA 


TTCCCAGTAT 


TTACCCAAGA 


AAAAACTGGT 


GACGGTGTTG 


GAACTTATGA 


21120 


AAAATCAATC 


ATCGAAGCGT 


TTGAAAAAGC 


AAACCCAGAT 


ATAAAAGTGA 


AATTGGAAAC 


21180 


CATCGACTTC AAGTCAGGTC CTGAAAAAAT CACAACAGCC ATCGAAGCAG GAACAGCTCC 


21240 


AGACGTACTC 


TTTGATGCAC 


CAGGACGTAT 


CATCCAATAC 


GGTAAAAACG 


GTAAATTGGC 


21300 


TGAGTTGAAT GACCTCTTCA CAGATGAATT TGTTAAAGAT GTCAACAATG AAAACATCGT 


21360 


ACAAGCAAGT 


AAAGCTGGAG 


ACAAGGCTTA 


TATGTATCCG 


ATTAGTTCTG 


CCCCATTCTA 


21420 


CATGGCAATG 


AACAAGAAAA 


TGTTAGAAGA 


TGCTGGAGTA 


GCAAACCTTG 


TAAAAGAAGG 


21480 


TTGGACAACT 


GATGATTTTG 


AAAAAGTATT 


GAAAGCACTT 


AAAGACAAGG 


GTTACACACC 


21540 


AGGTTCATTG 


TTCAGTTCTG 


GTCAAGGGGG 


AGACCAAGGA 


ACACGTGCCT 


TTATCTCTAA 


21600 


CCTTTATAGC 


GGTTCTGTAA 


CAGATGAAAA 


AGTTAGCAAA 


TATACAACTG 


ATGATCCTAA 


21660 


ATTCGTCAAA 


GGTCTTGAAA 


AAGCAACTAG 


CTGGATTAAA 


GACAATTTGA 


TCAATAATGG 


21720 


TTCACAATTT 


GACGGTGGGG 


CAGATATCCA 


AAACTTTGCC 


AACGGTCAAA 


CATCTTACAC 


21780 


AATCCTTTGG 


GCACCAGCTC 


AAAATGGTAT 


CCAAGCTAAA 


CTTTTAGAAG 


CAAGTAAGGT 


21840 


AGAAGTGGTA 


GAAGTACCAT 


TCCCATCAGA 


CGAAGGTAAG 


CCAGCTCTTG 


AGTACCTTGT 


21900 


AAACGGGTTT 


GCAGTATTCA 


ACAATAAAGA 


CGACAAGAAA 


GTCGCTGCAT 


CTAAGAAATT 


21960 


CATCCAGTTT 


ATCGCAGATG 


ACAAGGAGTG 


GGGACCTAAA 


GACGTAGTTC 


GTACAGGTGC 


22020 


TTTCCCAGTC 


CGTACTTCAT 


TTGGAAAACT 


TTATGAAGAC 


AAACGCATGG 


AAACAATCAG 


22080 


CGGCTGGACT 


CAATACTACT 


CACCATACTA 


CAACACTATT 


GATGGATTTG 


CTGAAATGAG 


22140 


AACACTTTGG 


TTCCCAATGT 


TGCAATCTGT 


ATCAAATGGT 


GACGAAAAAC 


CAGCAGATGC 


22200 


TTTGAAAGCC 


TTCACTGAAA 


AAGCGAACGA 


AACAATCAAA 


AAAGCTATGA 


AACAATAGTC 


22260 


CTTAGTTATT 


CTATAAAAAG 


TAGTTTTTTA 


AAGAACCTAA 


GAGTGTATAC 


CCCCTTTTCe 


22320 


CTCTACACAG ATAGTGTAAG AAAAGGGGGC TTTTGTTTAA AATGTAAGAA ACTGTCACGA 


22380 


AATTAAAATG 


AAGTTCTTAC 


ATAAGCGAAT 


CATAAAAAAT 


TTCATTTTGA 


TTTTAAAACA 


22440 


GTTCAAGAAA GTCAAAAAAT 


TATTCTATTT 


GAAAGAGAGG 


TGCCGACTGT 


GAAAGTCAAT 


22500 


AAAATCCGTA 


TGCGGGAAAC 


AGTGATTTCC 


TACGCTTTCC 


TAGCACCAGT 


ATTATTCTTC 


22560 


TTTGTCATCT 


TTGTGTTGGC 


TCCGATGGTG 


ATGGGCTTCA 


TTACAAGTTT 


CTTTAACTAC 


22620 


TCAATGACTA AATTTGAGTT 


TGTAGGCTTG 


GATAACTATA 


TCCGTATGTT 


TAAAGATCCT 


22680 


GTCTTTACAA 


AATCTCTGAT 


TAACACAGTT 


ATTTTGGTTA 


TTGGATCTGT 


ACCAGTTGTT 


22740 


GTTCTATTCT 


CACTCTTTGT 


AGCATCTCAG 


ACCTATCATC 


AAAATGTCAT 


TGCCAGATCC 


22800 


TTCTACCGTT 


TCGTCTTCTT 


CCTTCCTGTT 


GTAACGGGTA 


GTGTTGCCGT 


GACAGTTGTT 


22860 
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TGGAAATGGA 


TTTATGACCC 


ACTATCAGGG,- ATTCTAAACT 


TTGTCCTTAA 


GTCCAGCCAC 


22920 


ATCATCAGCC 


AAAACATTTC 


TTGGTTGGGA 


GATAAAAACT 


GGGCATTGAT 


GGCGATTATG 


22980 


ATTATTCTCT 


TGACCACTTC 


AGTTGGTCAG 


CCCATCATCC 


TTTATATCGC 


TGCCATGGGG 


23040 


AATATTGACA 


ATTCACTGGT 


TGAAGCGGCG 


CGTGTTGATG 


GTGCAACTGA 


GTTTCAAGTT 


23100 


TTTTGGAAGA 


TTAAATGGCC 


AAGCCTTCTT 


CCAACAACTC 


TTTATATTGC 


AATCATCACA 


23160 


ACAATTAACT 


CATTCCAGTG 


TTTCGCCTTG 


ATTCAGCTTT 


TGACATCTGG 


TGGTCCAAAC 


23220 


TACTCAACAA 


GTACCTTGAT 


GTACTACCTT 


TACGAAAAAG 


CCTTCCAATT 


GACAGAATAC 


23280 


GGCTATGCCA 


ACACAATTGG 


TGTCTTCTTG 


GCAGTCATGA 


TTGCTATCGT 


AAGCTTTGTT 


23340 


CAATTTAAAG 


TACTTGGAAA 


CGACGTAGAA 


TACTAAAGAA 


AGGAGACAGC 


TATGCAATCT 


23400 


ACAGAAAAAA 


AACCATTAAC 


AGCCTTTACT 


GTTATTTCAA 


CAATCATTTT 


GCTCTTGTTG 


23460 


ACTGTGCTGT 


TCATCTTTCC 


ATTCTACTGG 


ATTTTGACAG 


GGGCATTCAA 


ATCACAACCT 


23520 


GATACAATTG 


TTATTCCTCC 


TCAGTGGTTC 


CCTAAAATGC 


CAACCATGGA AAACTTCCAA 


23580 


CAACTCATGG 


TGCAGAACCC 


TGCCTTGCAA 


TGGATGTGGA 


ACTCAGTATT 


TATCTCATTG 


23640 


GTAACCATGT 


TCTTAGTTTG 


TGCAACCTCA 


TCTCTAGCAG 


GTTATGTATT 


GGCTAAAAAA 


23700 


CGTTTCTATG 


GTCAACGCAT 


TCTATTTGCT 


ATCTTTATCG 


CTGCTATGGC 


GCTTCCAAAA 


23760 


CAAGTTGTCC TTGTACCATT GGTACGTATC GTCAACTTCA TGGGAATCCA TGATACTCTC 


23820 


TGGGCAGTTA 


TCTTGCCTTT 


GATTGGATGG 


CCATTCGGTG 


TCTTCCTCAT 


GAAACAGTTC 


23880 


AGTGAAAATA 


TCCCTACAGA 


GTTGCTTGAA 


TCAGCTAAAA 


TCGACGGTTG 


TGGTGAGATT 


23940 


CGTACCTTCT 


GGAGTGTAGC 


CTTCCCGATT 


GTGAAACCAG 


GGTTTGCAGC 


CCTTGCAATC 


24000 


TTTACCTTCA 


TCAATACTTG 


GAATGACTAC 


TTCATGCAAT 


TGGTAATGTT 


GACTTCACGT 


24060 


AACAATTTGA 


CCATCTCACT 


TGGGGTTGCG 


ACCATGCAGG 


CTGAAATGGC 


AACCAACTAT 


24120 


GGTTTGATTA 


TGGCAGGAGC 


TGCCCTTGCT 


GCTGTTCCAA 


TCGTCACAGT 


CTTCCTAGTC 


24180 


TTCCAAAAAT 


CCTTCACACA 


GGGTATTACT 


ATGGGAGCGG 


TCAAAGGATA 


ATACTCTGCG 


24240 


AAAATCTCTT 


CAAACTACGT 


CAGCTTCACC 


TTGCCATACT 


TAAGTATTGC 


CTGCGGTTAG 


24300 


CTTCCTAGTT 


TGTTCTTCAA 


TTTTCATTGA 


GTATAGGAAA 


ATCAATCTAT 


CAAGATACAG 


24360 


AAGTATATTT 


TATAGATTTA 


GAGAATATAG 


AGGTTATAAG 


TGTCTACAAA 


ATGGAGGGTA 


24420 


TGCAGTTACT 


TTATGAAGTT 


TTGTCAGACA 


CTTATAAACT 


TAAGAATGGT 


TTTAGTTAAC 


24480 


TATCAGAAAC 


GAAGGAAAGA 


GTATGATTTT 


TGACGATTTG 


AAAAACATCA 


CCTTTTACAA 


24540 


AGGGATTCAT 


CCTAATTTAG 


ACAAGGCTAT 


CGACTATCTC 


TACCAACATC 


GTAAGGATTC 


24600 
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TTTCGAATTA GGAAAGTATG ATATTGATGG AGATAAAGTC TTTCTAGTTG TTCAGGAAAA 24660 

TGTCCTCAAT CAAGCTGAAA ATGATCAATT TGAGTATCAT AAGAACTATG CAGATTTGCA 24720 

TTTGCTGGTA GAAGGACATG AATATTCGAG CTACGGTTCA CGTATCAAAG ACGAGGCAGT 24780 

AGCATTCGAC GAAGCGAGTG ACATTGGCTT TGTTCATTGT CATGAACACT ACCCACTCTT 24840 

GTTGGGTTAT CACAATTTTG CGATTTTCTT CCCAGGTGAG CCACATCAGC CAAATGGTTA 24900 

TGCAGGCATG GAAGAAAAGG TTCGAAAATA TCTCTTTAAA ATTTTGATTG ATTAAAAATA 24960 

GGATGAATTG TTTTTTTGTA AAGCTTTGAT AATACTCTAC CATGAAATTG ATCTTTGTGA 25020 

GGTAGAGAAA TGAGAATAAA ATATTTAAAA ATTGGTATCT TCTAAGTATG CTGCAAGAGC 25080 

TAGTTTCTTA GATGGACAGG GGATTACAGT TGATGAGATG GCTTGGATAA TTAGGGGCAT 25140 

TGTGAATGCA TTGATTGGTA GATACATAAA ATTAGGTACT TATGCGGCTA AGTATGGTAT 25200 

TAGTATGGCA CGCTCGATCT TAAGTAGGGT AGCTGCAACT GCAGCAGCAA GAGTAGGATT 25260 

ACTGACCAAG ATTTCTGGAT GGATTTTACG AGTAGCTGTG AATGTAGCTG ATGTATATGG 25320 

TAATTTTGCC AACAATATTG CTGCAGCTTG GGATGCATAT GATAAAATTC CTAACAATGG 25380 

TCGTATAAAC TTTTAAAATG CGAGAATGAA AGCACTTTGT ATTTTTTTAT TGAATATGTT 25440 

AGCTTGGACA GTGCTTGCAA TGATAATTCG TGGAGGGCTA GATGGATTTG ATAGGCATAC 25500 

TTGGAGTACT ATTTTAATTG CGTCGCTGTT CGGGGTATAT GATTATAAGC CCATAGATAA 25560 

AAATAGAAAA AAGTCCAAAA GAAAAAATAG ATTTGTTCAT GGTAGGGACT TATGAAAGCT 25620 

TTACTGACAA AAAAGAAAAC AGTTTACAAA GAAAAATGAT GGAGGAGCAA ACATGGCACA 25680 

AAAAGGAGTA AGCCTTATCA AGGCAGCATT TGATACAGAT AACTTTCTCA TGCGTTTTAG 25740 

TGAGAAGGTC TTGGACATCG TGACAGCCAA TCTTCTTTTT GTCGTCTCTT GTTTACCCAT 25800 

CGTGACGATT GGAGTGGCTA AAATCAGCCT CTACGAGACC ATGTTCGAAG TTAAGAAGAG 25860 

CAGACGGGTG CCTGTTTTTA AAATCTATCT AAGATCTTTC AAGCAAAATC TGAAACTAGG 25920 

TCTTCAGCTG GGTTTAATGG AGTTAGGAAT TGTGTTTCTT ACCCTTTCAG ATCTCTATCT 25980 

TTTCTGGGGT CAAACAGCTC TGCCCTTCCA ATTGCTGAAA GCCATTTGTT TAGGTATTCT 26040 

GATTTTTCTT ACTATCGTGA TGCTGGCTAG TTACCCTATC GCGGCACGTT ATGACCTATC 26100 

TTGGAAAGAA ATTCTTCAAA AAGGATTGAT GTTGGCTAGT TTTAACTTTC CTTGGTTCTT 26160 

CCTCATGTTA GCCATTCTTG TCCTCATTGT GATGGTTCTT TATCTGTCCG CCTTCAGTCT 26220 

ACTCTTAGGT GGCTCAGTCT TCCTACTTTT TGGGTTTGGA CTATTGGTCT TTATCCAGAC 26280 

TGGATTGATG GAGAAAATTT TCGCAAAATA CCAATAGGAG CTTTATTTCT GAAACTACTT 26340 

TCAAAGGCTC CAAACGCTAT TCTATAAGCG AGAAACTAAA ATCGG 26385 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(X) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2716 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 4: 

CCTGCCCGCA TTGCCCTAGG CATTAAGTAA ACATATAAAA GCATGTGAGA GACTGTTGGA 60 

AAAGCGAGGA AATTTCCCCT CTTTTCCTCT AGTCTCTCCT TTCTTTTGCT GATTTTATTC 120 

AAAGAAAATG ATATAATAGT AGTTATGGAG AAAAAGAAAT TACGCATCAA TATGTTGAGT 180 

TCAAGTGAGA AAGTAGCAGG ACAGGGAGTT TCAGGTGCTT ACCGTGAATT AGTTCGTCTT 240 

CTTCACCGTG CTGCCAAGGA CCAATTGATT GTTACAGAAA ATCTTCCAAT CGAGGCAGAT 300 

GTGACTCACT TTCATACGAT TGATTTTCCC TATTATTTAT CAACCTTCCA AAAGAAACGC 360 

TCAGGGAGAA AGATTGGCTA TGTGCATTTC TTGCCAGCTA CACTTGAGGG AAGTTTGAAA 420 

ATTCCATTTT TCTTAAAGGG AATTGTGAAA CGCTATGTAT TTTCTTTTTA CAACCGGATG 480 

GAGCACTTGG TTGTGGTCAA TCCTATGTTT ATTGAGGATT TGGTAGCAGC TGGTATTCCA 540 

CGTGAAAAAG TGACCTATAT TCCTAACTTT GTCAACAAGG AAAAATGGCA TCCTCTACCA 600 

CAAGAAGAGG TAGTCAGACT GCGCACAGAT CTTGGTCTTA GTGACAATCA GTTTATCGTA 660 

GTAGGTGCTG GGCAAGTTCA GAAACGTAAA GGGATTGATG ACTTTATCCG TCTGGCTGAG 720 

GAATTGCCTC AGATTACCTT TATCTGGGCT - GGTGGCTTCT CTTTTGGTGG TATGACAGAT 780 

GGTTATGAAC ACTATAAGAA AATTATGGAA AATCCCCCTA AAAATTTGAT TTTTCCAGGC 840 

ATTGTATCGC CAGAGCGGAT GCGCGAATTG TATGCTCTAG CGGATCTTTT CTTGTTGCCT 900 

AGTTACAATG AGCTCTTTCC TATGACTATT TTAGAAGCTG CGAGTTGTGA GGCTCCTATT 960 

ATGTTGCGTG ATTTAGATCT CTATAAGGTG ATTTTGGAGG GAAATTATCG GGCGACAGCG 1020 

GGTAGAGAAG AGATGAAAGA GGCTATTTTG GAATATCAAG CAAATCCTGC TGTCTTAAAA 1080 

GATCTCAAAG AAAAGGCTAA GAATATTTCC AGAGAGTATT CTGAAGAGCA TCTGTTACAA 1140 

ATCTGGTTGG ACTTTTATGA GAAACAAGCC GCTTTAGGGA GAAAGTAAAA. AGTGAGGTAA 1200 

TCTATGCGAA TTGGTTTATT TACAGATACC TATTTTCCTC AGGTTTCTGG TGTTGCGACC 1260 

AGTATTCGAA CCTTGAAAAC AGAACTTGAA AAGCAGGGAC ATGCTGTTTT TATCTTTACG 1320 

ACGACAGATA AGGATGTCAA TCGCTACGAA GATTGGCAAA TTATCCGCAT TCCAAGTGTT 1380 
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CCTTTCTTTG CTTTTAAGGA TCGTCGCTTT GCCTACCGAG GTTTTAGCAA GGCACTTGAA 1440 

ATTGCTAAAC AGTATCAGCT AGATATTATC CATACTCAGA CAGAATTTTC TCTTGGCCTG 1500 

TTGGGGATTT GGATTGCGCG TGAATTGAAA ATTCCAGTCA TCCATACCTA TCACACCCAG 1560 

TATGAAGACT ATGTCCATTA TATTGCTAAG GGGATGTTGA TCCGGCCGAG TATGGTCAAG 1620 

TATCTGGTTA GAGGTTTCCT GCATGATGTG GATGGGGTTA TTTGCCCTAG TGAGATTGTC 1680 

CGTGACTTGC TATCTGATTA TAAGGTCAAG GTTGAAAAAC GGGTCATTCC TACTGGGATT 1740 

GAATTAGCCA AGTTTGAGCG TCCGGAAATC AAGCAGGAAA ATTTGAAAGA ACTGCGTAGT 1800 

AAACTAGGGA TTCAAGATGG TGAAAAGACG TTGCTTAGTC TTTCGAGAAT CTCCTATGAA 1860 

AAAAATATTC AAGCAGTTTT AGCAGCCTTT GCTGATGTTC TGAAAGAGGA AGACAAGGTT 1920 

AAACTGGTAG TAGCTGGGGA TGGCCCTTAT CTGAATGACC TCAAAGAGCA AGCCCAGAAC 1980 

CTAGAGATTC AAGACTCAGT CATCTTTACA GGGATGATTG CTCCTAGTGA GACGGCTCTT 2040 

TACTATAAAG CGGCGGATTT CTTCATTTCG GCATCGACAA GCGAAACGCA AGGTTTGACC 2100 

TACTTGGAAA GCTTAGCCAG TGGAACACCT GTCATTGCTC ACGGAAATCC TTATTTGAAC 2160 

AACCTCATCA GTGATAAAAT GTTTGGAACC TTGTACTATG GAGAACATGA TTTGGCTGGT 2220 

GCTATTTTGG AAGCCCTGAT TGCAACACCA GACATGAACG AGCATACCTT ATCAGAGAAA 2280 

TTGTATGAGA TTTCAGCTGA GAACTTTGGG AAACGAGTGC ATGAGTTTTA TCTGGATGCC 2340 

ATTATTTCAA ATAACTTCCA GAAAGATTTG GCTAAAGATG ATACGGTCAG TCAGCGTATC 2400 

TTTAAGACAG TTTTGTATCT TCAGCAACAG GTGGTTGCTG TACCTGTAAA AGGATCTAGA 2460 

CGCATGTTGA AGGCTTCAAA AACACAGTTG ATCAGTATGA GAGACTATTG GAAAGACCAT 2520 

GAAGAATAGA AAGAGGAACA GCTATGAAAA AAACAATTAA TGAGAAGCGG TCGTGATAAA 2580 

AAGATTGCGG GTGTTTGTGC TGGGGTGGCC CATTATCTGG ATATGGATCC GACTATCGTT 2640 

CAAGTCATTT GGGGTGTTCT TACTTGCTGT TACGGAGCTG GAATTGTAGC TTACATTATT 2700 

TTATGGATTA TCGCGA 2716 
(2 ) INFORMATION FOR SEQ ID NO: 5: ...... .. 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13926 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CTTTGGTTTT GCCTTATTCA AGACATGAGG GCCATCAGGA ATGATCTGAA ACTGCGAATC 60 
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TGTTAACAGT CTATGGAGAG CTTTCATAGA ACTAAGATTC GGTTTATCTT TGCTGCCACA 120 

AATTAGTAAG GTTGGATAAG GGTAAGTTCC TGCTATATCC GTTAAATCAA GTGTCTTCAA 180 

CTCCTCAGAA ACTCCGACCA TAAGAGTCTT GTCTGCTCCC TGTTTTTCAA ATACTCTTTT 240 

GGGAAGTAGT TTAAAAATCA GCAATTGAAG ATAAAATAGG ATATTCCCTG CTAATTTAAG 300 

CGGGCATCCT GACAGAATCA AAGCTCGAAG ATTTGGTAAA TCGTAACTGG AAAGTTCTAG 360 

TGTCAGGGCA GCACCTAAGG ACAATCCAAT CAAAACAAAA GGTTCTGTCT CTTGAGCTAG 420 

GTGCTGATAA ACTCGCTCTT TAGCTTGTTG ATAGTTACTA ACTCCAGAAG GAAATAACTC 480 

GATAGCCTCA GAAGGATAAT CTGTCAGTAG ATTCCGAACT TCTTTCCAAG ACTCTGCTGA 540 

CTGCCCTAAC CCATGCAAAA ATATTAATTT CATCTAGTTC TCCTCAAGGC TTAATTCATA 600 

CAAGCCTCTC ACTGCATTAC AGCCGTAAAT AGCTTCTGCT TGGGTTAAAT CTGCCAAGGT 660 

CAAGACTTTC TCTTCTACCT GTCCTGTTTC TAGCAAATGC TGACGGTAAA TTCCTGGCAA 720 

GATTCCAAGT CGGATAGGCG GTGTGTAGAG TTTTCCAGCG ATTTTCAGAA CCAAATTTCC 780 

TATAGAGGTT TCAAGCAGTT CTCCTGACTT ATTGTGGTAA ATCTTCTCTT GTTCTCCTAG 840 

GCTCAAATGC GGTCGGTGAG TGGTTTTAAA GTAGGTAAAG GATTGATTCA AAGCAGCTTC 900 

CTGAAGACAG ACTTGGGCCT GACAAAAGCT TGTACTGAGA GGGGTTAATA CTTGACGATT 960 

GACTTCTATC TCTCCAGATT TGCTAAGGCT GATTCGCAAG CGGTAATCTC GATTAGCTTC 1020 
ACAATCCTGA CACTCTTCCT- CAATCTTGTG TCCCAAGTCT TCTGCATCAA AAGGAAAAGC > 1080 

AAAATAACGA CTAGCTTTTC TCAGCCTTTC CAGATGTTGT TCTTCAAACA TCAGTTGTTT 1140 

TTGGCTGATT TTTCCAGTTG TAATTAATTG GAAGCGAGCT TGTTTACGAT AGAGAACTGC 1200 

TGCCTTTTGA TGAACCTCTC GGTATTCAGA TTCCCATGTG CTATCCCAAG TAATCCCTCC 1260 

GCCAACTCCA TAAATGGCTT GACCTTTGTG AAGTTGAATG GTACGAATGG CCACATTAAA 1320 

AATCCGTCGT CCATTTGGAA GCAAGAGACC AATCGTTCCA CAGTAGACTC CACGCGGTTG 1380 

AGGCTCCAAG TCCTTGATAA TCTCCATTGT CGCAATTTTC GGTGCACCCG TTATGGAACC 1440 

ACAAGGAAAG AGTGAGCGGA AGATTTCAAC AAGGTCCACA TCCTCTCGCA ACTGACTCTT 1500 

GATGGTCGAA GTCATCTGCC AAACAGTTGA ATACTGCTCT ACCTGACACA GACGCTCCAC 1560 

GTGCTCGCTC CCAACTTCAG AAATACGGTT CATATCATTG CGCAAGAGGT CCACAATCAT 1620 

CATATTTTCA GAGCGATTTT TGGGATCCTG TTCCAACCAA CTGGCCTGTT CAAGATCTTC 1680 

TTGGTCAGTT ACCCCACGCT GAGTCGTCCC CTTCATTGGT CGTGTTGTCA AGTCGCGATC 1740 

ATTTTGCTCA AAAAAGAGCT CTGGGCTCAT GGAAATCACT GTCATCTCGT CATGTTCCAC 1800 



WO 98/18931 



PCTYUS97/19588 



176 

ATAGGCATTG TAGCCCGCCT CCTGCTCTAC CACCATACGA TTGTAGATGG CAAAAGGATT 1860 

GGCATTTAAC TTTTGCTTAA GTTGGACGGT GTAGTTGACC TGATAGGTAT CTCCCTGCCG 1920 

TAAATGATGG TGAATTTGGG CAATGGCCTT TTCATAGTCT GCTGCAGACG TTACTTCCTG 1980 

CCAATTTGAG GGCAAATCAA TATCCTCATA AGTCAGAGGA ATAGGGGAAG TTTCTACGAT 2040 

ATCATGAACA GTAAAGTAAA GCAGGTACTC TCCCAGTAGG GGATCCTTGT GAACTGCTAA 2100 

TTTTTCCTCA AAAGCAGGTG CAGCCTCGTA GCTGACATAC CCCACCACAT AATAACCTTG 2160 

CTCTTGGTAG CTTTCCACTT GTGCCAGCAA ATCTGCCACT TCTTCTACAT TTCTCGTTTT 2220 

CAACTCTTTA ATAGGCTGGG TAAAGGTATA TCTCTCCCCC AAAGTCCTAA AATCAATCAC 2280 

TGTTTTTCTA TGCATACCTT AAGTATAGCA TAAAATAAGA AAACCCTCAT CCGCAAAGCA 2340 

GATGAGAGAT TTCAATTATT TAAAGATTGA AGTTTTAAAG CTATTTGTTT GTTGAAGAAG 2400 

TTTCTTATAA ACAGCTTCTT TTAATTTAAC TGTATTATTC ATAGATACTG TTTTATTACC 2460 

GTTTGCTTCT TGTTTAAGAG TTTCGGCATC TTTTTTAACA GCTTCTTTAA ACAATGTCAG 2520 

TAAATCATCG TATGATGAAA CGGAAGAACC ATTTACTTCG AATGTTGTTA ATCCTTTCGT 2580 

TGCTTTATCT TTAACTTCTT TGAAGTAAGC TTTTTTAAAT TCTTCAATAG TATTAAATGT 2640 

ATTGTTAGAT ATTTTCTTGA TAATATATTC ATCACTTAGA ACAGACTCAC CATCTGTTTT 2700 

AGATTGTTGT TTATATTTAT TTGAAGCATA ACCTAAGAAC CCATTTTCGT ATCCGTAGTA 2760 

ACCCCATAAT CTAAAAGCAT TATGTTTGAA TGAAACAGCT CCAGGAGCAC CTTTACTAGT 2820 

ATTACCTCCG TAGATACCGG TCATCATTCT AACACCTACA TAAGGTGATT GATCGTTATA 2880 

GCTAATTGCT TCGGGTTTAT AGATACCATT ACCTGGATTG CGATTAGTCA TTAATTGTTG 2940 

ATCAACTAAA TCATTAACAG ATTGAATATT TAATTCATTT TTCTCTTCTT GACTTAGATT 3000 

TCGAATTTTA TCCCATTGAT TTAATTTATT GTTATCACGG TATTCTCTAT CTATTTTTTT 3060 

GAACCATGCA CTATTTAAAT CTTTATTTTG TTGAGAAATC ACAGATTCAG CCTCAATTTC 3120 

ATCAAGAAGA GTTAAAGTGT CATTATAACC CTTCATATAT CTATTAATAT CTTCTCGTGT 3180 
TTTTAGAGTT TTTGGATCTG TAATATACCA CTGATTCCCA TCATTTTTGC GTTTAAATAC _ 3240 

CATATTAATA CCTAAAGAAC CAAACTCATC AAATCCACTA CCAGTAACAG GAGTTTGTAG 3300 

CATACCCTGA GCATATGCTT CAGCATCAGT ACCTTCACGG TGTCCAAAGC CACCTAAGTA 3360 

AATCGCACGG TCGTTGACGT GTGTTGTTTC ATGTGTGTAA ACTGAAATAC CGTATTCACC 3420 

AACCATTTCT AAATGAACAT ATTTTACATC AGTTCTAATA TCATCAGAGT TAGGATATAT 3480 

AGCAGCATAA GCTCCTGTTC CATTATAATT ATAATACTTA TCCATAGGAC CAAAGAATTC 3540 

TCTAAGAGGA GTATATACTT TGTCGGTATT ATAGCGGCCA TATTTTTCAA CCCATCCACC 3600 
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AGGAGCGTTA TAACCTTCCC AAATAGGAAT AACAGCATCT CTTAGTAGTC GTTGTTTAAC 3660 

GTTATCAGAC GCTAGACGAT ACCAGAAATC ATAATAGTTT CTATAACCAT CTGCAGCTTT 3720 

GTTAACGATA TCTTTAATAT CTTCTAATGA TTTTTTACCT AATCGCTCTG CACTACCAAA 3780 

GGCAATTGCA TTATAATTTG AAATTAAATA AAGATGTGCT TTATCAATAT TCAGTAGTGG 3840 

GAGTATAGTA TTTCTAAGGT GACTTCGTTT TAAATTATCG AATGCACGAT GTTTAGAATT 3900 

TTTAATTTCT TCGACCTCAG AAGCGCGTTC TGCGATGTAG ACATGGTCTT CTGTAGCATC 3960 

AATAAACCAA TCGTTCATAT TGTCTATATT TGTGAACAAT TGTCTATTAT AATTTAAAAA 4020 

TGCATCTAAA TTACCTGATT TAGTATATTT AGCCAATACT TGACCGAATG CGTCGAATGT 4080 

ACGTGAACCT TTAATGTTGT TCTCTTTAGA ACCGATTTCA ATTAATCTGT CTAATACGCT 4140 

AACTTTTTCA CCATAGAAAT CTGGTTTGAA TAGCATTAAT TCTTTAATAT TAACATCACC 4200 

AAATTTAACT CCATAGTAAC GATTTAGGTA AGTTAAACCT AGTAATAAAG CTGCTTTGTT 4260 

TTTCTCGACT TTATCACGAA TCATTTGACG AGCAGCTGGA GAATCATTTA GTTGATGTTC 4320 

TTCGTTTTGA ACTAATTTTG TGATTAGGTT TGTTAAGTTT TCTTTAACAT CTGTGAAGCT 4380 

TTCTTCTAAA TATAAATCTT TGATTGCATT AACTCTATAG TCACCTAATC GATTTAGATG 4440 

CTGATACATC GTTTGAGACT GAAGCTCTAC TGATTCTAAA ATAGATTTTA TATCATTAAC 4500 

AAGAGTAGTG TTATCTTTTT GAACGATATT AGGTGTATAT TTAATTCCTA AGTCAGTTAT 4560 

AGTATATTCT TTTACATTAC TTAAACCTTC ACTGCTAGAA GACAAGTTAA AGTAATCTTT 4620 

TGTACCGTCC GCATAGTGAA CAATAATTTT ATTAGCTTCA TCTAGGTTTG TGATAAACTC 4680 

ATTGTTGTTC ATCGCGGTAA CAGAAAGAAC TTCTTTAGTA TTTAGATGGT GTTCTTTATT 4740 

TAATTTATTA CCTTGATATA CAATATAATC TTTATTGTAG AATGGTATTA ATTTTTCAAG 4800 

ATTTTTATAG GCTTGGTTAT ATTCAGCGTT ATAATCTTGA ATACTAGAAT AGGCTTTTTC 4860 

TTCATTAAGT TTTGCAAGAG GAGATAGATC ACTTTCTAAT TTATCAGCAG TAATATTGAA 4920 

AGTAGTAACT TT AG CATC AG CTTGTTCTTT AGTTAATTTA GTAAATGTTT TAGATTTCCT 4980 

AAATGATCTA TTACCTGACG AATATCCCTC TACCGCATAT AAATCTTTTA TATGAGCACT 5040 

AGCATAATCA GAATCATCAA CGTCGTTAGA GCCGAATAAC TCCTCTCCAC GGATAATCTT 5100 

AGCATAGCTG ACAGAATTAC TTACCGTACC TACAGGCCAA GTCTTACTTG CTATTGCTCC 5160 

AACTTCTACT GGATTTGAAA CATCTATTTT ACCTTTTACA ACCGACTCAG TTAGGAGAGC 5220 

TTTTGTACCA ATAAGATGGT CTAGAGTTAA TCCATAATCT ACTTTAGGAA CTAACAAGCT 5280 

GGCGCGTGTT TTGTTTCCTG TAATAGTAGC ATCAACATAT GCTTTTCTAA CAATTCCTCT 5340 
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ATAGTTTGTA CCTGCAATTC CCCCTGTATG AGAGCCATTT CCACTTGTAG AGTGTAGTTT 5400 

GCCAAAGAAA GCAACATTTT CAATACGAGT TCCATCATTC ATATTATTTA CAAATCCAGC 5460 

AACATTATTA CGACCTGAAA GTGTGCCTGT AATTTTGACA TTTGTAATAA CTGAAGAACC 5520 

TTTCATAGTA TTGGCTAATG ATGCAATATT ATCTTGACCA GAACGTTCTA TCTCTACATT 5580 

TTCAAAATTC ACATTATTTA TCGTTGCGTT TGTTATCACA TTAAATAATG GATGTTCCAA 5640 

TTCAGTAATA GCAAATTGTT TTCCTTCAGA ACTTAAAAGT TTTCCTGTGA ATTCTTTAGT 5700 

GATATATGAT TTTCCATTAG GAACAACATT TCTAGCGCTC ATTGATTGTC CCAGACGATA 5760 

TTCTTTTGAA GGATCGTTTT GAATAGCTTC CACTAATTCT TTGAAATTAT AATATACATT 5820 

ATCTTCGTGG ACTTTAGGTT TTTCAATATA GTGAACGTAT TCTTCTTCAA ATTTATTATC 5880 

AGCAGTTCTA GAGACTAAAT TGTCTGCGAT TGCTGTAACT TTATATACAG GTGTTCCGTT 5940 

AACCGTAGTT TCTTCTATAT TTTTAACAGC TAGTAATGTA GTTTTCTGAT TATTTGAAGT 6000 

TATTTTTAAA TAATAATTGC TCTTATCATC AGGAATAGTT GTTATCAGTG ATT C ATT AG T 6060 

TTCTTTTCCA TTTTCGTATT TGATTAAATC TGTACGTTTA ATATTTTTAA GCTCAACTTT 612 0 

TTTAAGATCT AATTGAATAT TTTGATTTTC TAGAGTTTCA GTTTCTTCAC CGTTACCTCT 6180 

GTCGTAAATC ATAGTTGTAG ATAGGGTGTA TTCTTTGTAG TACTCTAGGT TCTTAAATGC 6240 

AGCGCTTATA GTTTCTGTTG TTACCTTGTC ATCTGTAAGG ACTACAGTAT TAATAACTTC 6300 

TTCTCCTTTT TTCAATTCAG CTGTGATTGA TTTGATTTTT GTTTTGTTTT GATTTTCTAG 6360 

AGTATACTTA GCAACAGCTT CACGTTCCAA TATTTTCTTA TCGGTACTAG TCAATGTTAA 6420 

TATTGGCTTT TCAGATAATT CAACCAATTT TTCAATAGTT GCAGTTAATT TTTCAACAGC 6480 

TTCGTTAACT TCACTTTGTT TAGCATCTGT ATTAGCTGCA ACTTTTTCAG CCTTTGTAAC 6540 

TTCAGTTTGG AGGTTTTGCC AACTTCTATC ACTGTAATGT TCTTTTACCT TTGTTTTTGC 6600 

ATCTGCAATC GTATTGTTTA ATTCAGTTTT ATCAACGTTT AGAGCGTCAA TAGOCGTTTT 6660 

AAGTTTATTT GTCTCGCTAT TTACCTCAGG CTGTTTTACA GGCTCTGAAG CATAGACACC 6720 

TTTTGCAGTT TCTAAAACAG GTCCAAGAGC ATTGTAACTT GCTGTAGAAT AATCAGTAGG 6780 

AGAAACTGAA CTAGCTTTAT CAATTTGATT ATTTAACTCA CTTTTATCAA CTGGTTCTTT 6840 

AGTACCAATA CCCTTTATTT TATCTTCTGG TTTCGGTGTT TCCTCTACAG CCTTCTCTTC 6900 

TTCAGGAACT TCTGGTTGCT TTTCTGGCTC AACTGGTGCC GTTGGTGCCT GTTCGTCTTC 6960 

TCTTGGCGCG ACTGGTTCAC CTGCTTGTTC AACTTTTGGT TCCTCTGTTG GTTCTGTTTG 7020 

TTTTTCTACA GCAGGCGTTT CAACTTTTGG TTGTTCAATA GATTGATTAA CAGTCTCCTC 7080 

TTTTGGTTCT ACAGTTTCTT CAGCCTTGGT ATCTGGAGTT GACTCTTCTT GTTTCGGTGT 7140 
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TTCCTCTACA GCCTTCTCTT CTTCAGGAGC TTCTGGTTGC TTTTCTGGCT CGACTGGTGC 7200 

CTTTTCGTCT TCTCTTGGCG CGACTGGTTC ACCTGCTTGT TCAACTTTTG ATTCCTCAGC 7260 

TGGTTTGTCT GATGGTTGAC TTTCTGGCTT AACTGCTACT TTTTCCTCTG GTTTTGACTC 7320 

AACTTCTCCA CCTACTTCTT CAACTGGAGC TGGTTCTGCT GAATCTTCTT TCCCCTCTTC 7380 

TACTTTAGGA AGGGTGTCGT CAGTAGGTTT TACCTCCGAT TTTGGTTCTT CCTTTGGACT 7440 

TTCTTCTGTT TTAGGTGCTT CTTCTTTTGG AGCTTCCTCT GTCTCTACTA CTTGGTTTTC 7500 

TGTCCTAGCT TGCTCCTGAT TTGTTATTGA TTGAGGAGTC TCAACTTCGA CCACAGTCAC 7560 

CTCTCCAGGT TTTGCTGAGG TTTCTTCTAA AACAGTGTCC AAGCCAAGCG TTTTGAGGAT 7620 

GTCACCTGAT AGATAACCAA CATAGCGATA GCCCTCCATT TCAACAACAC CCTCTCGACT 7680 

AGCCAGCGCT AGGGTCGCAA CTGGGTCTAC AGCCCCTGCA CTAGGAAGAA CTACCAATCC 7740 

CATAGCTCCA ACTAGAAAGA CGCTAGCAAT TTTCTTTCTC TTGTAGATTA AAAGCAAGCT 7800 

CCCAACAGTC AGCAAACCAA AAGCTGTCAA AACAGATGCT TCTGTCCCTG TTTGAGGCAA 7860 

CTGATCTTTT TGATACACCA AACCATATAC AACTTCATTC CTGTCAGGCT TTCCTGTCTG 7920 

AATTAAATCT TTAGCTTCTT GTGAAATAAT CTCTTTATTT ACATAGTGAT AGGTGGCTGC 7980 

GTCCACTACA GAAGGAGCCA TCAAAAGGCT TCCAAGAAAT ACAGAGCCTA CAACTCCCTT 8040 

AATCTTACGA ATTGAAAAAC GGTCTTTTTT AAACACTTTT ATCTCCTTTA TTCATTCTCA 8100 

AAACTTCCTA ATAGCATCTT GCGGATAGTG CGCACGCGCA CCTCCGATTA ATTTTGGACG 8160 

ACTAGCCAGT GCCGTTACAT GGGCATGACC AATCTCTCTC AAAATAGGGC GAATCGGAAC 8220 

CTGAACATGC TTGACATGCA TGCCAATTGC AGTGTCTCCG ATATCCAATC CAGCATGAGC 8280 

CTTGATAAAT TCAACCTCAA CTGGATCCTG CATAAACTTA AAGGCTGCCA ACTGCCCCGA 8340 

ACCTCCTGCA TGAAGAGTAG GATGGACACT GACAATTTCC AGACCAAACT GCTCTGCCAC 8400 

CTGACGTTCA ACAACGAGAG CCCGATTGAC ATGCTCACAA CCTTGAACTG CTAAATGGAT 8460 

ACCTCTACTA CCTAGAATAT CCAAGATAGT CTCCACTATC AGCTCACCAA TCTCTTGACT 8520 

GGATTCTTTC CCAATATGAC CACCTAGCAC CTCACTAGAA GATAGACCTA AAACAAAAAG 8580 

GGCCCCCTGC TTCAAATTGG TCTTTTCTAA AACATCTTCC ACTACCTGAC GTGTTTCTCT 8640 

TTGAATCTGT GTCTCGTTCA TCTCTGTTAC CTCTGTTGTC ACTCTTCTAT CATACCGTTT 8700 

TTTCTTGTTT TTAGCAAGAT AGACAACCTA GAAAGTTTGC CCAATTACGC ATAAAACTCC 8760 

CAGAATTGAC TGGGAGTTAG CTAGTTTCTA TTCTATTTAT ATATATTTCA ACTTTCGTCC 8820 

CTTTTTGGGG TCTAGAATCA ATCTTCATAT GGTAATTGGC TCCAAAATGA AGTTTGAGCC 8880 
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GTTGATCGAC ATTTTGAAGA CCAACTCCCC CACGTTTGAG TTGACTTTGA CTACTATCAC 8940 

CAGCATCTTG GAAGCCAACG CCATCATCCT CAATACGGAT GACCAATCCC GAATCCTGTT 9000 

TCTGGACAGA AAGTTTAATA TGGCCCTGAC CTTCCTTTTC CTTAATGCCA TGGTAAAGAG 9060 

CATTTTCTAC AAGGGGTTGT AGGACCAGCT TGGGTAAGAC TAAATTATCA AAGGCAACAT 9120 

TTTCATTAAT TTCGTATTCC AGCTTATCTC CATAGCGTTG TTTCTGGATA AAGAGATACT 9180 

GGCGGACATG ATTGATTTCG TCAGAGAGAC AAATCAAGTC CTTGCCTTGA TTGAGCGCCA 9240 

AGCGGAAATA GGTTGCCAAG GACTTGGTCA CCTGCACCAC TCGCTGACTA TCATGAAATT 9300 

CAGCCATCCA GATGATGGTG TCCAAAGTGT TATAGAGGAA ATGTGGATTA ATCTGGCTCG 9360 

AAAGGGCTTG AAGTTGGTAC TGACGGGTCG TTTCTTCCTG GCTACGAATA GCTACCATCA 9420 

ACTGATCAAT CTGATCCAAC ATAGCATTAA ATTGGCGAGT TACTTCTCTC AGTTCATAGG 9480 

CACCAACTTC CTTGGCACGA AGATTTTGAG CACCAGAAGC AATTTCCAAC ATGGTTTCTC 9540 

TCAAATCCTT CAAAGGAGCA ATCCAGCGTT TAAGACTGAA CCACACTAAG CAGAGACAGA 9600 

CAAGAAGAGA TGTGACACTG GCCCCAAGCA AGGTCCACAA GAGCTGACTC CGAACCTGGT 9660 

CTAACTTTTC CAATGATGAC ACGCCAAGCA CCGTCCAATC AGTTCCTGCA ATCTTCTCTT 9720 

GACTGACGTA GGATTTGTGA CCAGGAGTAT AACCCTGACC TGTATCGATG TAGGGTTTCA 9780 

TAGCCTCCAT TTTGCTAGAC GAACTATAAA CTGTGTGTTG AGGATGGTAG ACAAATTCAT 9840 

GGTTTTCATT GATAATGAAG GCAAAGCCCT GCTGCCCCAA CTGGAGTTGA TTGAGATAGG 9900 

CTTCCAGAGT TTCATAAGAA ATATCCAAAC GAAGCACACC AAGATTGGCT CCCTTTGCAT 9960 

CAACAAGTTC TTGAGTGACA GAAATGACCC ACTGACTATC TGATTTACGA GCTGGAGTCA 10020 

AAACAGGCAT AGCTCCCTGA TGAATGGCCT TTTGGTACCA ATCCTCAGCC ATCATATCAG 10080 

AGGAAGTTTT CATCTGCACA CTGTCATCTG TAGAAATGAC CTGACCAGAT TTGGTCACCA 10140 

GCACAACAGT TTTCAAGTCC TTATCTGACT TCAAGATGGT CAAAAACAAA TCTCGGATTC . 10200 

CCTCGACCTT GTCTTGACTG GGATTCTCAG CATAGGCCAG AACATCCGTC TGCTGGGTCA 10260 

AACCAGTCGA GGTGGTTTCT AGTTTTTTGA TATAAGACTG AATAAAGTGG CTAGTCTGGC - 10320 

TGATGGTCGT TTGGCTGTTG CCCTCAATGG TGGCCTCAAT GGCTGAAGAA CTTGATTGAT 10380 

AGTAGAAAGT TCCAACCAGA GCTAGGAGAA TGAGAAAGAC CAGAAAGATG GAAATAACCA 10440 

TTCTAACTAA AAGAGAAGAA CGCTTCATCG GTCTTCTCCC TTCTTAAACT GACGAGGTGT 10500 

CACACCTGCA ATCTGCTTAA AACGTTGGGT AAAATAGTTC ATATCTTCAA AACCAACCTT 10560 

CTCTGCGATC TCATAAATCT TCAGATCTGT AGTTAAAAGC AAGAGCTTGG CTTGTTTAAC 10620 

ACGTTCTCTC ACCAGATAAT CCTGAAAAGG CAAGCCCAAC TCTTTCTTAA TCAAGGAACT 10680 
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CAGATAGGTC GGACTAAAAC CTAAGTCACT GGCTAAAGAC TTTAAACTAA ATTGGCTATC 10740 

AGCCAGATGA GACTGGATTT TCTGGGCCAT GTTTCCTTCA AACCTATTAG TCAATAAATC 10800 

TTGTAACTGC TCTTCTTTCT CTTCCTTGTC TAGTTTTTGT TTGATTTTCC CCAACATTTC 10860 

CTCAATATCC TGACGAGAAA AGGGTTTGAG CAGGTAGTCG TCCACACCTA GTTTGACAGC 10920 

AGACAAGGCA TAATCAAAAT CATCGTAACC TGTTAAAAAG ACCAAATGAA CCTGAGGATA 10980 

GGTTTCTCGT ACCAGACTGG CCAACTGGAT GCCATTTAGA TGAGGCATGT TGATATCGGT 11040 

TAAAATGATA TCTGGCACCT GCTTTTGGAT CAATTCCCAA GCCTGCCTTC CATTTTCAGC 11100 

CTGACCGATG ATTTCCATAT CGTAGGCTGC TACATTGACC AGTTTAGTCA AACCTTGTCT 11160 

TACCAGATAT TCATCTTCTA CGATTAAGAT TGTGTAGGTC ATGCTCTGCT CCTTTACCAC 11220 

TTACTAGTAT CAGTATAGCA AAATTCTCCT CTAACTGCTT AGGAAAGACC TCTTATACTC 11280 

AATAAAAATC AAAAAGTAAA CTAGGAAGAT AGCCACAGGT TTCTCAAAGT ACCGCTTTGA 11340 

GGTTGTAAAT AAAACTGACG AAGTCGACTC AAAGTATAGC TTTGAGGTTG TAGATAAAAC 11400 

TGACGAAGTC GATAACCCTA CATACGGTAA GGCGACGCTG ACGTGGTTTG AAGAGATTTT 11460 

CGAAGAGTAT TAATCAACAT AATCTAGTAA ATAAGCGTAc CTTTTTCTTC CATTTGGTCT 11520 

TTGGGAATAA AGCGGATAGA GAGGCTATTG ATACAGTAAC GTAAGCCGCC CTTGTCCTGT 11580 

GGACCATCCG TAAAGACATG CCCAAGGTGA GAATCTCCTA CTCGGCTCCG CACTTCCATA 11640 

CGCGTCATAT TGTAGGACTT ATCTTCCTTG TAGGTGACAA CATCTGGACT GATGGGTTGG 11700 

GTAAAACTAG GCCAGCCACA ACCAGACTCA AATTTGTCTT TTGATGAAAA GAGAGGTTCC 11760 

CCAGTTGCTA TATCCACATA GATACCGGAT TCAAATTTAT CCCAGTAACG GTTTGAGAAA 11820 

GCTCGTTCTG TTTGATTTTC CTGGGTAACT GCATACTCCT CAGGTGACAG GGTCTTTTTC 11880 

AATTCCTCAT CACTTGGTTT TGGATATTTG CTGGCATCAA TGACAGGATA GGCCGCCTGA 11940 

TTAACATTGA TATGGCAGTA GCCATTTGGA TTTTTCTTGA GATAGTCTTG ATGGTAATCC 12000 

TCAGCCACCA CAAAATTCTT CAAGTTTTCC TTTTCAACTG CTAGAGGTTG ATCGTATTTC 12060 

TTAGCCACCT CATCAAAGAC TTGGTTAATC ACTTCCAAAT CCTTGTCATC TGTGTAATAA 12120 

ACACCAGTAC GGTACTGGGT CCCCACATCA TTTCCTTGTT TATTTTTGCT GGTTGGATTG 12180 

ATAATGCGGA AATAGTGAAG CAGGATTTCC TTGAGAGAAA TTTGCTTGGC ATCATAGGTG 12240 

ACATGGACGG TTTCTGCATG ACCTGTTTGG TTAATCAATT CGTACTTGGT TGTTTCTCCT 12300 

CTACCATTTG CATAGCCTGA AACGGCATCC GTCACCCCGG GAACACGTGA GAAATATTCC 12360 

TCCACTCCCC AGAAACAACC TCCAGCTAGA TAAATTTCGT GCAAGTCTGC GTCTTTACTA 12420 



^ 
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ATTTCTGTTT TTTTCACTGC TTTTCCTCCT TGGCTAACTG CCGCCTTTTC AATTTGCGAG 12480 

GCATCTGTCT GCCCTGCATT TCGTATCAAT AGAACATAGA AACCGGTTAT GGCTAGAAAA 12540 

AATACTCCTA GCAACAAGAA GATTTTTAAC TTATCATTCA TAAGACGCCT CCTAGGCTAA 12600 

TTCCTTCAAA GTTTGCAAAA TTGCATCTTT TTCCATGAAT CCTGGATGTG TTTTGACCAG 12660 

CTTGCCTTCT TTGTCTATAA AGGCTTGGGT TGGGTAAGAA CGGACACCAT AAGTTTCCAA 12720 

AAGTTTGCCT GATGGGTCAA CTAGGACTGG GAGATTTTTA TAATCCAATC CCTTATACCA 12780 

ATTCTTAAAG TCCGCTTCAG ATTGCTCTCC CTTATGTCCT GGTGACACTA CTGTCAAGAC 12840 

CACATAGTCA TCACCAGCTT CTTTAGCAAT CTCATCCGTA TCTGGAAGAC TAGCCAGACA 12900 

GATGGAACAC CAAGAAGCCC AGAATTTGAG ATAGACTTTC TTGCCCTTGT AATCAGATAA 12960 

ACGGTAGGTC TTGCCATCTA CTCCCATCAA TTCAAAATCA GCCACCTCTT TCCCTTTAGC 13020 

TGCGCTTGTT TTACTAGCTG TCTGCTCCGT CTTCATTTCA TCTTTCGTTT GGTGTTCACT 13080 

AGTCACGGAC TTGCCTGAAC AAGCCGTCAA ACAAAGGAGC GAACCTGCTC CAAGAACACA 13140 

TGTTTGCCAT TTTTTCATAT TGATATTCCT TTCCATTTTA TTCAAATAAT TGACTTAAAA 13200 

TTGAAGCATT TCCAAACAGA ACCAAGAAGC CCATCACAAT AATGAGAAAA CCACCCACTT 13260 

TTTTGAGGAT TCCGAGATAG GGATGAAGTT TTCGGAAATG TTTCAAAACA TAACTAGAGG 13320 

TCAGAGCTAG AAGCAAGAAT GGTAGCGCCA AGCCCAGCGT ATACACCAAC ATGAGACCAG 13380 

CTCCCTGCCA AGCTCCTGAA CCACCTGAAG CCGCCAAGGC CAAAACAGAC CCCAGAACCG 13440 

GCCCCACGCA AGGCGTCCAA GCAAAACTAA AGGTCAAGCC CAATAAAAAT GCCTGACTAT 13500 

AGCCCTTACC ATTTTGCCCC TGTCCTTGCA GTTGTAGCCT CTTTTCCTTA TAAAGCCCCT 13560 

TAAAGTGTAG AATCTCCATT TGGTGCAAAC CAAGAAGGAT AATAATTGCC CCAGTAAGAT 13620 

ATTGGAACCA AGAAGCATAA AGCAAATCGC CTAAAAAACC AGCTCCATAG CCCAACAAAA 13680 

TAAATATAAA GGAAATTCCT GCTATAAAGG CCAGAGTTCG TAATAAACTA GTAACTGAGA 13740 

TTGAAAATTT GCCGCTAGAA GCCTGAGCAC CATCCTTATC ATCTAGTAAC ACTCCTGTAT 13800 

AGACCGGTAA CAAAGGTAAG ATACAAGGAG AAAAGAAGGA TAGAATCCCT GCCAAAAAGA 13860 

CACTTAGAAA AAAGAAAATA TGACCCATAA AGTTCCTCCT ATCATTTTAT TGATAGATTT 13920 

ATTATA 13926 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20199 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



CCCAGCAGAA 


AAATGGCATT 


TGGAGATAAT 


GGAAATCGTA 


AAAAAACTAT 


GTTTGAGAAA 


60 


ATAACCTTGT 


TTATCGTGAT 


TATCATGCTA 


GTAGCAAGTT 


TATTGGGAAT 


TTTTGCAACT 


120 


GCAATTGGTG 


CCCTCAGTAA 


TCTATAAAAT 


AGATTCAAGA 


AAATTTAGTG 


ACTGGGATTT 


180 


CCCAGCCCTT 


TTTTAAAGTG 


AGAAGAAATA 


ATGAGTATGT 


TTTTAGATAC 


AGCTAAGATT 


240 


AAGGTCAAGG 


CTGGTAATGG 


TGGCGATGGT 


ATGGTTGCCT 


TTCGTCGTGA 


AAAATATGTC 


300 


CCTAATGGAG 


GCCCTTGGGG 


TGGTGATGGT 


GGTCGTGGAG 


GCAATGTGGT 


CTTCGTTGTA 


360 


GACGAAGGAC 


TACGTACCTT 


GATGGATTTC 


CGCTACAATC 


GTCATTTCAA 


GGCTGATTCT 


420 


GGTGAAAAAG 


GGATGACCAA 


AGGGATGCAT 


GGTCGTGGTG 


CTGAGGACCT 


TAGAGTTCGA 


480 


GTACCACAAG 


GTACGACTGT 


TCGTGATGCG 


GAGACTGGCA 


AGGTTTTAAC 


AGATTTGATT 


540 


GAACATGGGC 


AAGAATTTAT 


CGTTGCCCAC 


GGTGGTCGTG 


GTGGACGTGG 


AAATATTCGT 


600 


TTCGCGACAC 


CAAAAAATCC 


TGCACCGGAA 


ATCTCTGAAA 


ATGGAGAACC 


AGGTCAGGAA 


660 


CGTGAGTTAC 


AATTGGAACT 


AAAAATCTTG 


GCAGATGTCG 


GTTTAGTAGG 


ATTCCCATCT 


720 


GTAGGGAAGT 


CAACACTTTT 


AAGTGTTATT 


ACCTCAGCTA 


AGCCTAAAAT 


TGGTGCCTAC 


780 


CACTTTACCA 


CTATTGTACC 


AAATTTAGGT 


ATGGTTCGCA 


CCCAATCAGG 


TGAATCCTTT 


840 


GCAGTAGCCG 


ACTTGCCAGG 


TTTGATTGAA 


GGGGCTAGTC 


AAGGTGTTGG 


TTTGGGAACT 


900 


CAGTTCCTCC 


GTCACATCGA 


GCGTACACGT 


GTTATCCTTC 


ACATCATTGA 


TATGTCAGCT 


960 


AGCGAGGGCC 


GTGATCCATA 


TGAGGACTAC 


CTAGCTATCA 


ATAAAGAGCT 


GGAGTCTTAC 


1020 


AATCTTCGCC 


TCATGGAGCG 


TCCACAGATT 


ATTGTAGCTA 


ATAAGATGGA 


CATGCCTGAG 


1080 


AGTCAGGAAA 


ATCTTGAAGA 


CTTTAAGAAA 


AAATTGGCTG 


AAAATTATGA 


TGAATTTGAA 


1140 


GAGTTACCAG 


CTATCTTCCC 


AATTTCTGGA 


TTGACCAAGC 


AAGGTCTGGC 


AACACTTTTA 


1200 


GATGCTACAG 


CTGAATTGTT 


AGACAAGACA 


CCAGAATTTT 


TGCTCTACGA 


CGAGTCCGAT 


1260 


ATGGAAGAAG 


AAGCTTACTA 


TGGATTTGAC 


GAAGAAGAAA 


AAGCCTTTGA 


AATTAGTCGT 


1320 


GATGACGATG 


CGACATGGGT 


ACTTTCTGGT 


GAAAAACTCA 


TGAAACTCTT 


TAATATGACC 


1380 


AACTTTGATC 


GTGATGAATC 


TGTCATGAAA 


TTTGCCCGTC 


AGCTTCGTGG 


TATGGGGGTT 


1440 


GATGAAGCCC 


TTCGTGCGCG 


TGGAGCTAAA 


GATGGGGATT 


TGGTCCGCAT 


TGGTAAATTT 


1500 


GAGTTTGAAT 


TTGTAGACTA 


GGAGACTGGT 


ATGGGAGATA 


AACCGATATC 


TTTCCGAGAT 


1560 


GCGGATGGTA 


ATTTTGTTTC 


CGCCGCAGAC 


GTTTGGAATG 


AAAAGAAATT 


GGAAGAACTA 


1620 
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TTTAATCGTC TCAATCCAAA TCGTGCCTTG AGATTGGCAC GAACTAAAAA GGAAAATCCA 1680 

TCTCAGTAAA GAAGCTAAAA AATCCCGTGC CTCATCAGAC ACGGGATTTT GTGGTACGAC 1740 

AGGCATGTAT AGCAAACTGA ATCTGGAATA GCACAGCATA TCTTCTAAAA TATAGTAAAA 1800 

TGAAATGAGA ACAGGACAAA TCGATCAGGA CAGTAAAATC GATTTCTAAC AATGTTTTAT 1860 

AAGCAGAGAT GTACTATTCT AGTTTCAATC AACTATATTG TTATAAATTG ATTTGAATTT 1920 

CAAAATTAAA TTGTTTGATT CTTATTTCAA TTTGTTATAG TATATCTGAT GTCAAAGTTC 1980 

TCGGCGAGTC AAATAGCGAT TCCCAAGCCT GACTATCGTG AGGTAGCGGA TTAAAATGGT 2040 

CTGGGGATAG ACCGTTTTAA GTCTGACGCT GGAAATAAGA ATTGTCAGAA GAAGGGATAG 2100 

CGAAATCGTG GCTCTACGAA CAGGAACGTG ATAATAAGGC GTATATAGCG GATAAGAGGG 2160 

CATCAAACTC TAAAGTCCAA AAAGGTAGTC GTAACCTATA TGCGTAAATC ACGAGAGTAA 2220 

TTGAATTCGT ACTAAGATTT TCTATTTTCA CTGTAACCTT TTAACGCCCT TATATCTTGT 2280 

ATACACGAGG AAAGATGTAC GACTTATCCC GTGAGGTCTA TCACTATAAA GAGAAAACGA 2340 

CAGATAGAAG TGATCCTGAG TCACGGTTAT CTGTCTGATA GGACGGTATG TATAAAACGC 2400 

TTCTGTGAAC TGAGAGAAGG GGGAGAAGTT CTTGCTAAAA TTTAGTTGAA CAGCCGTATT 24 60 

CCGATACTTA GATAAGAGAT CTAGTCTTAG CTCCTACTCA GTTTTAGGGG ATAAAAAAGG 2520 

GGCAATAGCG ATTCGAGAAA GATTATACTC TTCGAAAATC TCTTCAAATC ACGTCAATAT 2580 

CGCCTTGTCG TATGTGTAGG ATACTGACTA CGTCAGTTCC ATCTACAACC TCAAAACAGT 2640 

GTTTTGAGCA ACcTGCGGCT AGTTTCCTAG TTTGATCTTT GATTTTCATT GAGTATTAGT 2700 

AATTCAGTTA CTAACTCGTC AACTCTGATT TATCCAATAA AATTGAAAAG GATGGAAAAA 2760 

AGGATAAATT TATGATATAC TTTATTTTGA AGACCTTATT AGAAATCTTG AAAGAGTATT 2820 

GAAAACTTAG AATGAGAAAA ATTGTTATCA ATGGTGGATT ACCACTGCAA GGTGAAATCA 2880 

CTATTAGTGG TGCTAAAAAT AGTGTCGTTG CCTTAATTCC AGCTATTATC TTGGCTGATG 2940 
ATGTGGTGAC TTTGGATTGC GTTCCAGATA TTTCGGATGT AGCCAGTCTT GTCGAAATCA ' 3000 

TGGAATTGAT GGGAGCTACT GTTAAGCGTT ATGACGATGT ATTGGAGATT GACCCAAGAG 3060 

GTGTTCAAAA TATTCCAATG CCTTATGGTA AAATTAACAG TCTTCGTGCA TCTTACTATT 3120 

TTTATGGGAG CCTCTTAGGC CGTTTTGGTG AAGCGACAGT TGGTCTACCG GGAGGATGTG 3180 

ATCTTGGTCC TCGTCCGATT GACTTACACC TTAAGGCGTT TGAAGCTATG GGTGCCACTG 3240 

CTAGCTACGA GGGAGATAAC ATGAAGTTAT CTGCTAAAGA TACAGGACTT CATGGTGCAA 3300 

GTATTTACAT GGATACGGTT AGTGTGGGAG CAACGATTAA TACGATGATT GCTGCGGTTA 3360 

AAGCAAATGG TCGTACTATT ATTGAAAATG CAGCCCGTGA ACCTGAGATT ATTGATGTAG 3420 
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CTACTCTCTT GAATAATATG GGTGCCCATA TCCGTGGGGC AGGAACTAAT ATCATCATTA 3480 

TTGATGGTGT TGAAAGATTA CATGGGACAC GTCATCAGGT GATTCCAGAC CGCATTGAAG 3540 

CTGGAACATA TATATCTTTA GCTGCTGCAG TTGGTAAAGG AATTCGTATA AATAATGTTC 3600 

TTTACGAACA CCTGGAAGGG TTTATTGCTA AGTTGGAAGA AATGGGAGTG AGAATGACTG 3660 

TATCTGAAGA CAGCATTTTT GTCGAGGAAC AGTCTAATTT GAAAGCAATC AATATTAAGA 3720 

CAGCTCCTTA CCCAGGCTTT GCAACTGATT TGCAACAACC GCTTACCCCT CTTTTACTAA 3780 

GAGCGAATGG TCGTGGTACA ATTGTCGATA CGATTTACGA AAAACGTGTA AATCATGTTT 3840 

TTGAACTAGC AAAGATGGAT GCGGATATTT CGACAACAAA TGGTCATATT TTGTACACGG 3900 

GTGGACGTGA TTTACGTGGG GCCAGTGTTA AAGCGACCGA CTTAAGAGCT GGGGCTGCAC 3960 

TAGTCATTGC TGGGCTTATG GCTGAAGGTA AAACTGAAAT TACCAATATC GAGTTTATCT 4020 

TACGTGGTTA TTCTGATATT ATCGAAAAAT TACGTAATTT AGGAGCGGAT ATTAGACTTG 4080 

TTGAGGATTA AACCGTAGAG GTGTTTATGA ATATTTGGAC CAAATTAGCA ATGTTTTCTT 4140 

TTTTTGAAAC GGATCGCTTG TATTTGCGTC CTTTCTTTTT TAGTGATAGT CAGGACTTCC 4200 

GCGAGATAGC TTCAAATCCA GAAAATCTTC AATTTATTTT CCCAACGCAG GCAAGTCTGG 4260 

AAGAAAGTCA ATATGCACTG GCCAATTACT TTATGAAGTC CCCTTTGGGA GTGTGGGCAA 4320 

TTTGTGACCA GAAAAATCAA CAAATGATTG GTTCTATTAA ATTTGAGAAG TTAGATGAAA 4380 

TCAAAAAAGA AGCTGAGCTT GGCTATTTTT TGAGAAAAGA TGCTTGGTCG CAAGGATTTA 4440 

TGACAGAGGT TGTTAGAAAA ATTTGTCAGC TTTCTTTTGA GGAATTTGGC TTAAAACAAT 4500 

TATTTATCAT TACCCACCTT GAAAATAAAG CTAGCCAAAG AGTTGCTCTT AAGTCTGGAT 4560 

TTAGTTTGTT CCGTCAGTTT AAGGGAAGTG ATCGTTACAC AAGAAAAATG CGGGATTATC 4620 

TTGAATTTCG GTATGTAAAA GGAGAGTTCA ATGAGTAAGC ATCAGGAAAT TCTAAGCTAT 4680 

TTGGAGGAAT TACCAGTAGG TAAAAGGGTC AGTGTTCGTA GCATTTCGAA TCATCTAGGA 4740 

GTTAGTGATG GAACAGCCTA TCGGGCTATT AAAGAAGCTG AAAACCGTGG AATTGTGGAG 4800 

ACCCGTCCTA GAAGTGGAAC AATTCGTGTT AAATCCCAGA AAGTTGCTAT AGAGAGATTA 4860 

ACGTTTGCTG AAATTGCAGA AGTGACTTCT TCTGAGGTTC TGGCTGGGCA AGAAGGTTTA 4920 

GAGAGAGAAT TTAGTAAGTT TTCAATTGGT GCCATGACTG AACAAAATAT CTTGTCTTAC 4980 

CTTCATGATG GGGGGCTCTT GATTGTCGGA GACCGAACCC GTATTCAGTT GCTAGCCTTG 5040 

GAAAATGAAA ATGCAGTTCT GGTTACAGGG GGATTTCAGG TTCATGATGA TGTGCTTAAA 5100 

CTGGCCAATC AAAAAGGGAT TCCTGTTCTA AGAAGTAAGC ATGATACCTT TACCGTCGCG 5160 
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ACCATGATCA ATAAAGCCTT GTCAAATGTC CAAATCAAGA CTGATATTCT GACAGTTGAG 5220 

AAACTTTATC GCCCTAGTCA TGAGTATGGT TTTCTGAGAG AGACAGATAC AGTTAAAGAT 5280 

TATTTGGACT TGGTTCGTAA GAATCGTAGC AGCCGTTTCC CTGTTATCAA TCAACATCAG 5340 

GTCGTTGTTG GTGTTGTAAC CATGAGAGAC GCTGGTGATA AATCACCAAG CACGACAATT 5400 

GATAAGGTTA TGTCTCGTAG TCTATTTTTG GTTGGATTAT CGACAAATAT TGCCAATGTG 5460 

AGTCAACGGA TGATCGCAGA AGACTTTGAA ATGGTACCAG TTGTTCGAAG CAATCAAACT 5520 

TTGCTTGGCG TTG TGACGCG ACGAGATGTC ATGGAGAAGA TGAGCCGTTC CCAAGTTTCG 5580 

GCTCTACCAA CTTTTTCTGA GCAGATTGGA CAAAAGCTCT CTTATCACCA TGATGAAGTA 5640 

GTCATTACAG TGGAACCCTT TATGCTAGAA AAAAATGGAG TTTTGGCTAA TGGTGTATTG 5700 

GCAGAAATTC TGACCCACAT GACCCGATTT AGTTGTTAAT AGTGGTCGGA ATCTCATTAT 5760 

CGAGCAGATG CTGATCTACT TTTTGCAGGC TGTTCAGATA GATGATATAT TGCGCATTCA 5820 

GGCACGGATT ATTCATCATA CGAGACGGTC AGCTATAATT GATTACGATA TTTATCATGG 5880 

TCACCAGATT GTTTCAAAAG CAAATGTGAC TGTTAAAATT AATTAGAAAC TAGGAGAAAA 5940 

GATGATAACA TTAAAATCAG CTCGTGAAAT CGAAGCTATG GACAAGGCTG GTGATTTTCT 6000 

AGCAAGTATT CATATAGGCT TACGTGATTT GATTAAGCCA GGCGTAGATA TGTGGGAAGT 6060 

TGAAGAATAT GTCCCCCGTC GTTGTAAAGA AGAAAATTTC CTTCCACTTC AGATTGGGGT 6120 

TGACGGTGCC ATGATGGACT ATCCTTATGC TACCTGTTGC TCTCTTAACG ATGAAGTGGC 6180 

TCACGCTTTC CCTCGTCATT ATATCTTGAA AGATGGTGAT TTGCTCAAAG TTGATATGGT 6240 

TTTGGGAGGT CCCATTGCTA AATCTGACCT AAATGTCTCA AAATTAAACT TCAACAATGT 6300 

TGAACAAATG AAAAAATACA CTCAGAGCTA TTCTGGTGGT TTAGCAGACT CATGTTGGGC 6360 

TTATGCTGTT GGTACACCGT CCGAAGAAGT CAAAAACTTG ATGGATGTAA CCAAAGAAGC 6420 

TATGTACAAG GGTATTGAGC AAGCTGTTGT TGGAAATCGT ATCGGTGATA TCGGTGCGGC 6480 

TATTCAAGAA TACGCTGAAA GTCGTGGTTA CGGTGTAGTG CGTGATTTGG TTGGTCATGG 6540 

_.. TGTTGGCCCA ACTATGCACG AAGAACCAAT GGTTCCTAAC TATGGTATTG CAGGTCGTGG 6600 

. ACTCCGTCTT CGTGAAGGAA TGGTCTTAAC CATTGAACCA ATGATCAATA CAGGCGATTG 6660 

GGAAATTGAT ACAGATATGA AAACTGGTTG GGCGCATAAG ACCATTGACG GTGGATTGTC 6720 

ATGTCAGTAT GAACACCAAT TTGTCATTAC GAAAGATGGA CCTGTTATCT TGACTAGCCA 6780 

AGGTGAAGAA GGAACTTATT AATAAAAAGT GAAAAGACTA CTGGAAGTTT ATTTTGATAA 6840 

AAAATCCAGT AGATCTTTTC ATAATAAAAC GCATTGTATC AAGTGTTAGG GGCTGATATC 6900 

ATGCGTTTTT CTGCTTTTAA GATTTTTTCC AACTCTGTTT GTAAGCGCAT CATAACAAAG 6960 
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GGTCTAGGAT TCAGGGCTCT CCTCCTATAT ACTATTAGTA AAGTAAAACT AAGGGAGGAT 7020 

ATTTTAGTGT CGCAGTCTAT TGTTCCTGTA GAGATTCCAC AATATTGTCG TTTTGATTCT 7080 

AAAAAGAGAA ATGGAATTCT GTTTAATGTT CGTATTGCCA ATCTTAAATT TACTTTTTTA 7140 

TATTATACTT CCTGCGAAAC AAAATATGGT ATAGTAGTTC TATGAATGAT GAAGCAAGTA 7200 

AACAACTAAC TGATGCACGA TTTAAGCGTC TTGTTGGTGT TCAGCGTACC ACTTTTGAAG 7260 

AGATGTTAGC TGTATTAAAA ACAGCTTATC AACTTAAACA CGCAAAAGGT GGACGAAAAC 7320 

CTAAATTAAG CCTAGAAGAC CTTCTTATGC CCACTCTTCA ATAGTGCGAG AATATCGAAC 7380 

TTATGAAGAA ATTGCGGCTG ATTTTGGTAT TCACGAAAGC AACTTTATCC GTCGGAGCCA 7440 

ATGGGTTGAA ATAACTCTTG TTCAAAGTGG TTTTACGGTT TCAAGAACTC CTCTCAGTTC 7500 

TGAGGACACG GTAATGATTG ATGCGACGGA AGTAAAAATC AATCGCCCTA AAAAAACAAT 7560 

TAGCGAATGA TTCTGGTAAA AAGAAATTTC ACGCTATGAA GGCTCAAGCG ATTGTCACAA 7620 

GTCAAGGGAG AATTGTTTCT TTGGATATCG CTGTGAACTA TAGTCATGAT ATGAAGTTGT 7680 

TCAAAATGAG TCGTAGAAAT ATCGAACAAG CTGGTAAAAT CTTGGCTGAC AGTGGTTATC 7740 

AAGGGCTCAT GAAGATATAT CCTCAAGCAC AAACTCCACG TAAATCCAGC AAACTCAAGC 7800 

CGCTAACAGC TGAAGATAAA GCCTATAACC ATGCGCTATC TAAGGAAAGA AGCAAGGTTG 7860 

AGAACATCTT TGCCAAAGTA AAAACGTTTA AAATATTTTC AACAACCTAT CGAAATCATC 7920 

GTAAACGCTT CGGATTACGA ATGAATTTGA GTGCTGGTAT TATCAATCAT GAACTAGGAT 7980 

TCTAGTTTTG CAGGAAGTCT ATTGAGGTAT TGAGCTAGTT TATGAAAAAA TTGGGTGAAA 8040 

AGTCGAGTGT TTTAGAAACC CACAGTGTAG TATTCTAGTT TCAATCCACT ATATTTTGCT 8100 

ACTCCCCGTA AAGTTTCTAT TTTCCCTGAT TTCTGATATA ATAGAAATAT TGACTTCAAG 8160 

AGTAAGGAAG AGAAGATGAA CGCATTATTA AATGGAATGA ATGACCGTCA GGCTGAGGCG 8220 

GTGCAAACGA CAGAAGGTCC CTTGCTAATC ATGGCAGGGG CTGGTTCTGG AAAGACTCGT 8280 

GTTTTGACCC ACCGTATCGC TTATTTGATT GATGAAAAGC TGGTCAATCC TTGGAATATC 8340 

TTGGCCATTA CCTTTACCAA CAAGGCTGCG CGTGAGATGA AAGAGCGTGC TTATAGCCTC 8400 

AATCCAGCGA CTCAGGACTG TCTGATTGCG ACCTTCCACT CCATGTGTGT GCGTATTTTG 8460 

CGTCGCGATG CGGACCATAT TGGCTACAAT CGTAATTTTA CAATTGTGGA TCCTGGTGAA 8520 

CAGCGAACGC TCATGAAACG TATTCTCAAA CAGTTGAACT TGGACCCTAA AAAATGGAAT 8580 

GAACGAACTA TTTTGGGGAC CATTTCCAAT GCTAAGAATG ATTTGATTGA TGATGTTGCT 8640 

TATGCTGCCC AAGCTGGCGA TATGTATACG CAAATTGTGG CCCAGTGTTA TACAGCCTAT 8700 
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CAAAAAGAAC TTCGTCAGTC TGAATCCGTT GACTTTGATG ATTTGATTAT GCTGACCTTG 8760 

CGTCTCTTTG ATCAAAATCC TGATGTTTTG ACCTACTACC AGCAAAAATT CCAATACATC 8820 

CACGTTGATG AGTACCAAGA TACCAACCAC GCTCAGTACC AATTGGTCAA ACTCTTGGCT 8880 

TCCCGTTTTA AAAATATCTG TGTGGTTGGG GATGCGGACC AGTCTATCTA CGGTTGGCGT 8940 
GGTGCTGATA TGCAGAATAT CTTGGACTTT GAAAAGGATT ACCCCAAAGC CAAGGTTGTT - 9000 

TTGTTGGAGG AAAATTACCG CTCAACCAAA ACCATTCTCC AAGCGGCCAA CGAGGTTATT 9060 

AAAAATAATA AAAATCGCCG TCCTAAAAAT CTCTGGACTC AAAACGCTGA TGGGGAGCAA 9120 

ATCGTTTACT ATCGTGCCGA TGATGAGCTG GATGAGGCTG TATTTGTAGC CAGAACCATC 9180 

GATGAACTTA GTCGCAGTCA AAACTTCCTT CATAAGGATT TTGCAGTTCT CTATCGGACT 9240 

AATGCCCAGT CCCGTACAAT TGAGGAAGCC CTGCTCAAGT CTAACATTCC TTATACCATG 9300 

GTTGGCGGAA CCAAATTCTA CAGCCGTAAG GAAATTCGCG ATATTATTGC TTATCTCAAC 9360 

CTTATTGCTA ATTTGAGTGA CAATATTAGT TTTGAGCGTA TTATCAACGA GCCTAAACGT 9420 

GGAATTGGTC TAGGTACAGT TGAGAAAATC CGTGATTTTG CAAATTTGCA AAATATGTCT 9480 

ATGCTGGATG CTTCTGCTAA TATTATGTTG TCTGGTATCA AGGGTAAGGC AGCCCAATCT 9540 

ATCTGGGATT TTGCCAATAT GATGCTTGAT TTGCGGGAGC AGCTAGACCA CTTAAGCATT 9600 

ACAGAGTTGG TTGAGTCCGT CCTAGAAAAA ACAGGTTATG TCGATATTCT TAACTCCCAA 9660 

GCGACTCTAG AAAGCAAGGC ACGGGTTGAA AATATCGAAG AGTTTCTTTC TGTTACGAAG 9720 

AACTTTGATG ACACCACGGA TGTGACAGAA GAGGAAACTG GTCTGGACAA ACTGAGTCGT 9780 

TTCTTAAATG ACTTGGCTTT GATTGCCGAC ACAGATTCAG GTAGTCAGGA GACATCAGAA 9840 

GTGACCTTGA TGACCCTGCA TGCTGCCAAA GGTCTCGAAT TTCCAGTTGT CTTTTTGATT 9900 

GGGATGGAAG AAAATGTCTT TCCACTTAGT CGTGCGACTG AAGATTCAGA TGAATTAGAA 9960 

GAAGAGCGCC GTCTAGCCTA TGTAGGTATC ACGCGTGCAG AGAAAATTCT CTATCTGACC 10020 

AATGCCAACT CACGCTTGCT TTTTGGTCGT ACCAATTATA ACCGTCCGAC TCGTTTTATT 10080 

AACGAAATCA GTTCAGACTT GCTTGAGTAT CAAGGTCTGG CTGGTCCTGC AAATACAAGG 10140 

TTTAAGGCAT CATATAGCAG TGGTAGTATT TCCTTTGGTC AAGGTATGAG TTTGGCTCAG 10200 

GCTCTTCAAG ACCGTAAACG CGGTGCTGCC CCAAAATCAA TCCAGTCAAG CGGTCTTCCA 10260 

TTTGGTCAAT TTACAGCTGG CGCAAAACCA GCATCTAGCG AGGCAAATTG GTCCATTGGT 10320 

GATATTGCTC TCCACAAGAA ATGGGGAGAG GGAACCGTTC TGGAAGTTTC AGGTAGCGGT 10380 

GCTAGGCAGG AATTGAAAAT CAATTTCCCA GAAGTAGGTT TGAAAAAACT TTTAGCCAGT 10440 

GTGGCTCCAA TTGAGAAAAA AATCTAATTT TCCATCCTTC TCACGAATAA TAAAGTGAGG 10500 
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AGGATTTTTA TGTACAGTAT TTCATTCCAA GAAGATTCAC TATTACCAAG AGAAAGGCTG 10560 

GCCAAGGAAG GAGTTGAAGC GCTTAGTAAC CAAGAGTTGC TAGCTATTTT ACTCAGGACA 10620 

GGAACACGTC AAGCTAGCGT TTTTGAAATT GCCCAAAAAG TCTTGAACAA TCTTTCAAGC 10680 

CTAACGGATT TGAAAAAAAT GACCCTGCAG GAATTGCAGA GTTTGTCTGG TATTGGGCGT 10740 

GTTAAGGCCA TAGAATTACA AGCTATGATT GAACTGGGGC ATCGTATTCA CAAACACGAG 10800 

ACTCTTGAAA TGGAAAGTAT TCTCAGCAGT CAAAAGTTGG CCAAGAAGAT GCAGCAGGAA 10860 

TTAGGGGATA AAAAACAAGA GCACCTGGTG GCACTCTATC TCAATACTCA AAATCAAATC 10920 

ATCCATCAGC AGACCATTTT TATCGGGTCT GTAACTCGTA GTATCGCTGA ACCGCGAGAG 10980 

ATTCTTCACT ATGCAATCAA GCATATGGCG ACTTCTCTTA TCTTGGTCCA CAATCATCCT 11040 

TCAGGAGCGG TAGCGCCTAG CC AAAATGAT ' GATCATGTCA CTAAACTTGT TAAAGAAGCC 11100 

TGCGAATTGA TGGGGATTGT TCTCTTGGAC CATTTGATTG TCTCTCATTC TAATTACTTT 11160 

AGTTATCGTG AAAAGACAGA TTTAATCTAA AGTTCATTAA CGACATAGTC AAAGAGTTTT 11220 

TTATCTTTGG GACGATTTTC AAAAAGAAGT TCTGGATGCC ATTGGACACC GAGAAAGGCG 11280 

ACATCATCCG TACTCATGAC AGCCTCAATG ATACCATCTT TAGGATCATG AGCCACAACT 11340 

TTTAAATTTG GTGCTAAGTC CTTGATGCTC TGGTGGTGGA AGGAGTTGAT ATGAGAGATT 11400 

TCTCCATAGA TTTCTTGGAG AACGGTATCT GGTTCTGTTA CCAAGCGTTG AGTTGTGTAC 11460 

TCAACAGAAG AATCCTGCCA ATGGTCTTCG ATATCTTGGT ACAAAGTTCC ACCCATGGCA 11520 

ACGTTAAAGA GTTGGGTACC ACGGCAGACA GAGAAAATGG GCTTTTTCTG TTTAATAGCT 11580 

TCCTTGATGA GGGCCAGTTC GAAGATATCT CTTTGAAGGT GATAGTCATC ACTATCAATG 11640 

GTTTTGGGTT CGCCATAAAA TTTTGGATCG ACATTTTGCC CACCTGTCAA GATGAGCTTG 11700 

TCAATCAAAC TGATATAGTG GCAGGCCATT TCTTGATCAC CAATCGGTAG GATGATGGGA 11760 

ATCCCTCCAG CATCTTTAAC GCCTTCAACA AAGCCTTTTG CTGCGTAGCT CATCATGATG 11820 

TCATCATCTG GATGAGTTTT TTCGTTTCCT GTAATCCCAA TAACTGGTTT TTTCATAAAA 11880 

TGATTTTCGC TTTCTAATCC TCTTTTCGCA TGAAGTAGAG GAGGGTTTGG AGTTCACTTG 11940 

TCAAATCGAC ATACTGAACG ACCACGTCTT TTGGTAAATG CAGATGGACT GGTGAAAAAC 12000 

TGAGAATTCC TTTCACACCA GCATCAACCA AGAGATTAGC AACCTCTTGT GACTTGACGC 12060 

TGGGAACAGT TAGGATAGCA GTCTTCACAT CAGCATCCTT GATTTTATCC TTGATCTGAG 12120 

AAATCCCGTA AATGGGAATC CCGTCAGGAG TTTGGGTACC GACTTCAGGA TGGTCGTCTA 12180 

GGTCAAAGGC CATGATAATC TTCATCTTGT TACGTTCGTG GAAGCGGTAG TGGAGAAGGG 12240 
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CATGGCCCAT ATTTCCAATA CCAACCAGCA TGACATTGGT AATAGAGTTG TCATTGAGCA 12300 

AATCGGCAAA AAATGTCATT AGTTTTTTGA CATCATAGCC AAAACCACGA CGACCAAGTT 12360 

CACCAAAATA GGAAAAATCA CGACGTACGG TCGCTGAATC AATACCGATA GCCTCTGCAA 12420 

TTTGCTTAGA GTTGGCACGT TCAATCTTTT CTGCATGAAA TCTCTTAAAA ATTCGATAGT 12480 

AGAGAGAGAG TCTTTTTGCT GTAGCTTTTG GAATAGCAAA CTGTTTATCT TTCACAAAAT 12540 

CACAACCTTT CTATTCTTCT ATTTTATAGA AACATTGTGA AAAAATCAAC AAAAATAAGA 12600 

AAAAACTAAG AAAAATCTTA GTTTTGATGT AAAAAATCTG CATGAGATAG AAAACGGTAG 12660 

AGGTCTCCGA CCAGCCCCTG ATAAACTTTT TTGCCCCTAA AAGTCAGAGA AGTCACATAA 12720 

AGTGTATCTG GTAAGGTTAC ACATCCTGAC AAAGTCAACA TGAGAGCCTC ATGATCCTCA 12780 

TACTTGAGAG TACGCTCTAC ATGATAGCAG TCCTTATAGG TCAGTTCAAA CATTTTGGCT 12840 

CTATCTTTCC GATTTTGTAA AGACACCACG TTCTACCAAG CTATCCATGA GGAAGTAGAA 12900 

TTTTTCCTGA TGAATATGGT GGTCTTCTGA TTTGAAAATA TCAACTAGAC GAAGGCCAAA 12960 

CTTGTCAGTG ATATTGATTT TAGCCCCTGT AAGTTCCTTG TTAATGATGA TTTTGAGTTG 13020 

GAAGCCTTCA CCGCTGTTTG GCACTTTTTC CAAAAGGCGA GTCAGTTCAT AGTTACCAAC 13080 

CTTAGTTTCA AAAAAGGTGT TATCTTTGAG GGTGAATTTT TTAACAGAAG GGCTAAGAGT 13140 

GTAATCGTAA CGACAATTTT TTAACTGAAT GATTTTTTCA AATGCCATAT GGCTAACCTC 13200 

CGATAATTTC TTTTAAGGTT TTTGCGAGGG TTTGTAGGTC TTCAACGGTA TTTTGTGGCG 13260 

ACAAACTGAT GCGAAGGGAT TCCTTCAAGC GTTCTGAATT TGCGCCATAC ATGGCTTCAA 13320 

GAACATGGCT GGATTGGACA ACGCCTGCAG TACAGGCTGA GCCAGTAGAG ATTGAAATTC 13380 

CAGCTAAATC TAGCCGAAGG AGTAAGAGGT CATTTTTCTG ACCAGGAAAT CCAATATTGA 13440 

GAACATAAGG GAGATGATGT TTTCCTCTAT TCAGGTAATA CTGAATGCCC TCCAGCTCTG 13500 

CCAGAAAGGC AGTTTCTAGA TTTTGTACAT GTTGAAAATG TTCTTCTTGT TTTTCTAGGT 13560 

CTTCTTTTAG GGCTGCAACC ATGCCTACAA TGGCAGGCAG ATTTTCAGTT CCTGCACGTT 13620 

TTTTCTGTTC CTGGTCTCCG CCATGTAGAT AGGAATCAAA GTCCATGCTA GATGCGTAGA 13680 

GAAAACCGAT TCCCTTAGGA CCATGGAATT TGTGGGCAGA AGCAGTGAGA AAATCAATGC 13740 

CCAATTCTTC TGAATGAATT GGGATTTTAC CAATAGCCTG AACTGCATCA ACATGATAGG 13800 

CAGCAGGGTG TTGCTTGAGT ATTTGGCCAA TTTCAGCGAT GGGCAGTAGG TTTCCTGTCT 13860 

CATTATTGAC AAACATGGTA GAAACCAAAA TCGTATCGTC ACGTAAAGCC TTTTGAATTT 13920 

GCTGGGCTGT GATTTCTTGA TTTTCTGGCT GGATAATGGT TGCTTCAAAC CCAAAGTGTT 13980 

GAACCAAGTA ATCAATTGTT TCAAGGACAG CATGGTGCTC GATGGCAGTT GTGATGATAT 14040 
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GTTTTCCTTG TTCTTGGTGA CGAAGACAGT AGCCAATGAT GGTAGTATTA TTGCCTTCAG 14100 

TCCCACCAGA AGTGAAAAAG ATATGTTGAG GTTTTGTCCT TAGTAACTGG GCTAGTTCCT 14160 

GACGGGCTTC TCGCAAGAGT TTGCCAGCTT GACGACCATG ACCATGAATA CTAGAAGGAT. 14220 

TTCCGTGGGT TTCTTGCATA ACCTTGGTCA TAGCTGAAAT AGCAACTGCT GACATAGGAG 14280 

TCGTTGCAGC ATTGTCCAAA TAAATCAAAG AATCACCTTA TTTCTTTTTA TTGTAGGCAA 14340 

AGAGTGGGCT GACTGGTTTT CTTTCGTGAA TACGGACGAT AGCATCACCA ATTAACTCAC 14400 

TAGCAGTGAT GTAGCATACA TTTTTAGGAG TTTTTTCTTT TGTTGCTACT GAATCAGTCA 14460 

CAAGAATTTC TTTAATATTA GTATTGTCAA GAAGCTCAGC AGCTCCCTCG ACGAAGAGAC 14520 

CGTGGCTAGA AACAGCATAA ATTTCTGTAG CTCCTTCACG TTCAACGATT TTAGAAGCTT 14580 

CAGAGAAGGT ACGTCCTGTA TTTAAAATAT CATCAATCAA GATAGCTTTC TTACCTTCAA 14640 

CATCACCAAT AATATAACCT TCGTTACGAG TTGCATCGTC TTGAGGGTAG TCGATAATGG 14700 

CGATAGGAGC ATCAAGATAT TCAGCCAGGC TACGCGCACG TTTGACACCT GAATTTTTAG 14760 

GGCTAACGAC AACAACATCT GAACCAAGCA ATCCTTTATC GCAGTAATGT TTTGCGAATA 14820 

GGGGAACAGT GAAAAGATTA TCCACTGGAA TATCAAAGAA ACCTTGAACC TGAACGGCAT 14880 

GCAAATCAAG AGTCAGGATA CGATCAACTC CAGCCTTAAC CAGCATATTG GCAACTAGTT 14940 

TTGCTGTAAG TGGCTCACGA GGACAAGCAA TGCGGTCTTG ACGTGCATAG CCAAAATATG 15000 

GAAGGACAAC GTTGATACTG TGGGCACTTG CACGCACACA AGCATCGACC ATGATTAACA 15060 

ATTCCATTAG GTGGTTGTTG ACAGGGAAAC TTGTTGATTG GATGATGTAA ACATCATAAC 15120 

CACGGACACT TTCTTCGATA TTTACTTGGA TTTCTCCGTC TGAAAATTGA CGTGATGATA 15180 

GTTTTCCAAG TGGGACACCA ACAGCTTGGG CAATTTTTTG TGCAATCTCT TGGTTAGAGT 15240 

TGAGTGCGAA AAGTTTCATG TTTTTTCTAT CTGACATTAT AGACCGTCCT CTGTAAACTT 15300 

TATAAATCCT AGTTATATTT ACCTTACATA TATGAACTGG GATTTGTGTA TTTTTATCTT 15360 

TTCTATTTTA CCAAAAAATG GAGATTATTT CAGCTATTTT TCATACTTTT GACAAATCGA 15420 

ACCAATTTTG AAGGAGCTTT TTGATAGGAA ATCTGATTTT TCTCTAAAAA TTGTCGAAAA 15480 

TCCTGTTTGC CTTGCTCATG ATTTTCCACT TCAAGCTCCA ATTCGTAATC TGTTATATCA 15540 

AAGTATCGGC TCTGATCCAG TGCCATGAGA CCAATAGCTG TTTTCATTTC ATAGCGAAGC 15600 

GTTGTTAGAC AACCAAGAAC CTGCCAGTTC TTACTTTGGA TACCATGTTT CGCCAATTCA 15660 

TCCAGTACTA GCCCTTGAGG AAGTTCTTCC TTACTCAGAT AGTTCTCAGC ATCTTTTAGT 15720 

TGCAATTTTT GGTTGTATTC CATGTTTCCA ACACTCTGCG GGACTTTGAG TGTCAACTCA 15780 
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GCCCAGTCTT CAAAGGTTCG AATGCGCATA GCGACTTTCT TTTCTCGCAG TTCAAAATCA 15840 

GGCGTGTCGA TGTAGTAATT TGTTTGAAGA ACAGGAGTGA CACCTGTGAA CTGGTCTTTT 15900 

AGACGATTGT ATTCATCTTT TTTCAATAGT GTTTTCAATT CAATTTCTAA ATGTTTCATT 15960 

TTTCTTACCT TTTTTTATCG TTGAAAGCGG ATTTATGGTA TAATAAGCAT TGTATTTATT 16020 

GTATATGAAT CTGGAGAAAA AATCAAAGAT ATTTTTGACG GATAATATGA GAACAAGGGA 16080 

GAATATATGA CCTTAGAATG GGAAGAATTT CTAGATCCTT ACATTCAAGC TGTTGGTGAG 16140 

TTAAAGATTA AACTTCGTGG TATTCGTAAG CAATATCGTA AGCAAAATAA GCATTCTCCA 16200 

ATTGAGTTTG TGACCGGTCG AGTCAAGCCA ATTGAGAGCA TCAAAGAAAA AATGGCTCGT 16260 

CGTGGCATTA CTTATGCGAC CTTGGAACAC GATTTGCAGG ATATTGCTGG CTTACGTGTG 16320 

ATGGTTCAGT TTGTAGATGA CGTCAAGGAA GTAGTGGATA TTTTGCACAA GCGTCAGGAT 16380 

ATGCGAATCA TACAGGAGCG AGATTACATT ACTCATAGAA AAGCATCAGG CTATCGTTCC 16440 

TATCATGTGG TAGTAGAATA TACGGTTGAT ACCATCAATG GAGCTAAGAC TATTTTGGCA 16500 

GAAATTCAAA TTCGTACTTT GGCCATGAAT TTCTGGGCAA CGATAGAACA TTCTCTCAAC 16560 

TACAAGTACC AAGGGGATTT CCCAGATGAG ATTAAGAAGC GACTGGAAAT TACAGCTAGA 16620 

ATCGCCCATC AGTTGGATGA AGAAATGGGT GAAATTCGTG ATGATATCCA AGAAGCCCAG 16680 

GCACTTTTTG ATCCTTTGAG TAGAAAATTA AATGACGGTG TAGGAAACAG TGACGATACA 16740 

GATGAAGAAT ACAGGTAAAC GAATTGATCT GATAGCCAAT AGAAAACCGC AGAGTCAAAG 16800 

GGTTTTGTAT GAATTGCGAG ATCGTTTGAA GAGAAATCAG TTTATACTCA ATGATACCAA 16860 

TCCGGATATT GTCATTTCCA TTGGCGGGGA TGGTATGCTC TTGTCGGCCT TTCATAAGTA 16920 

CGAAAATCAG CTTGACAAGG TCCGCTTTAT CGGTCTTCAT ACTGGACATT TGGGCTTCTA 16980 

TACAGATTAT CGTGATTTTG AGTTGGACAA GCTAGTGACT AATTTGCAGC TAGATACTGG 17040 

GGCAAGGGTT TCTTACCCTG TTCTGAATGT GAAGGTCTTT CTTGAAAATG GTGAAGTTAA 17100 

GATTTTCAGA GCACTCAACG AAGCCAGCAT CCGCAGGTCT GATCGAACCA TGGTGGCAGA 17160 

-TATTGTAATA AATGGTGTTC -CCTTTGAACG TTTTCGTGGA GACGGGCTAA CAGTTTCGAC 17220 

ACCGACTGGT AGTACTGCCT ATAACAAGTC TCTTGGCGGT GCTGTTTTAC ACCCTACCAT 17280 

TGAAGCTTTG CAATTAACGG AAATTGCCAG CCTTAATAAT CGTGTCTATC GAACACTGGG 17340 

CTCTTCCATT ATTGTGCCTA AGAAGGATAA GATTGAACTT ATTCCAACAA GAAACGATTA 17400 

TCATACTATT TCGGTTGACA ATAGCGTTTA TTCTTTCCGT AATATTGAGC GTATTGAGTA 174 60 

TCAAATCGAC CATCATAAGA TTCACTTTGT CGCGACTCCT AGCCATACCA GTTTCTGGAA 17520 

CCGTGTTAAG GACGCCTTTA TCGGCGAGGT GGATGAATGA GGTTTGAATT TATCGCAGAT 17580 
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GAACATGTCA AGGTTAAGAC CTTCTTAAAA AAGCACGAGG TTTCTAAGGG ATTGCTGGCC 17640 

AAGATTAAGT TTCGAGGTGG AGCTATTCTG GTCAATAATC AACCGCAAAA TGCAACGTAT 17700 

CTATTGGACG TTGGAGACTA CGTTACCATT GACATTCCCG CTGAGAAAGG CTTTGAAACC 17760 

TTGGAGGCTA TTGAGCTTCC ATTAGATATT CTCTATGAGG ATGACCACTT TCTAGTCTTG 17820 

AATAAACCCT ATGGAGTGGC TTCTATTCCT AGTGTCAATC ACTCTAATAC CATTGCCAAT 17880 

TTTATCAAGG GTTACTATGT CAAGCAAAAT TATGAAAATC AGCAGGTTCA CATTGTTACC 17940 

AGACTAGATA GGGATACTTC TGGCTTGATG CTCTTTGCCA AGCACGGTTA TGCCCATGCA 18000 

CGATTAGACA AGCAGTTGCA GAAGAAATCT ATCGAGAAAC GCTACTTTGC TTTGGTTAAG 18060 

GGAGATGGAC ATTTGGAGCC AGAAGGGGAA ATTATTGCTC CGATTGCGCG TGATGAAGAT 18120 

TCCATTATTA CCAGACGAGT GGCTAAAGGC GGAAAGTATG CCCATACTTC ATACAAGATT 18180 

GTAGCTTCTT ATGGAAATAT TCACTTGGTC TATATTCACC TGCACACTGG TCGAACCCAT 18240 

CAAATCCGAG TCCATTTTTC TCATATCGGT TTTCCTTTGC TGGGAGATGA TTTGTATGGT 18300 

GGTAGTCTGG AAGATGGTAT TCAACGTCAG GCTCTGCATT GCCATTACCT ATCCTTTTAT 18360 

CATCCATTTT TAGAGCAAGA CTTGCAGTTA GAAAGTCCCT TGCCGGATGA TTTTAGTAAC 18420 

CTTATTACCC AGTTATCAAC TAATACTCTA TAAAAACTGT CTCAGAGTAT AATTATTATC 18480 

TTAAAGGAGA AAACTCATGG AAGTTTTTGA AAGTCTCAAA GCCAACCTTG TTGGTAAAAA 18540 

TGCTCGTATC GTTCTCCCTG AAGGGGAAGA GCCTCGTATT CTTCAAGCAA CAAAACGCTT 18600 

AGTAAAAGAA ACAGAAGTGA TTCCTGTTTT GCTTGGAAAT CCTGAAAAAA TTAAAATTTA 18660 

TCTTGAAATT GAAGGAATCA TGGATGGTTA TGAGGTCATC GACCCTCAAC ATTATCCTCA 18720 

ATTTGAAGAA ATGGTTTCTG CCTTGGTGGA GCGTCGCAAG GGCAAAATGA CTGAAGAAGA 18780 

TGTACGCAAG GTTTTGGTTG AAGATGTCAA CTACTTTGGT GTGATGTTGG TTTACTTGGG 18840 

CTTGGTTGAT GGAATGGTGT CAGGAGCGAT TCACTCAACA GCTTCAACAG TTCGCCCAGC 18900 

TCTACAAATC ATCAAAACTC GTCCAAATGT AACTCGTACT TCAGGAGCCT TCCTCATGGT 18960 

TCGTGGTACG GAACGTTACC TATTTGGAGA CTGTGCCATT AACATCAATC CAGATGCAGA 19020 

AGCCTTGGCT GAAATTGCCA TCAACTCAGC AATCACAGCT AAGATGTTTG GCATCGAACC 19080 

TAAAATTGCC ATGTTGAGCT ATTCTACTAA AGGTTCAGGG TTTGGTGAAA GCGTTGATAA 19140 

GGTCGTTGAA GCAACTAAAA TTGCTCACGA CTTGCGTCCT GACCTTGAAA TCGATGGTGA 19200 

GTTGCAATTT GATGCAGCCT TTGTTCCTGA AACTGCAGCT CTGAAAGCTC CTGGAAGTAC 19260 

GGTAGCTGGT CAAGCAAATG TCTTCATCTT CCCAGGTATC GAGGCAGGAA ATATTGGTTA 19320 
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CAAGATGGCT GAACGCCTGG GTGGCTTTGC GGCTGTAGGA CCTGTTTTGC AAGGTTTAAA 19380 

CAAGCCAGTT AATGATCTTT CTCGTGGATG TAATGCAGAT GATGTTTACA AGTTGACCCT 19440 

CATCACAGCA GCTCAAGCAG TTCATCAATA GTGAAAACTA TAAAGTGATA TACTATGCTA 19500 

TACTGTAGTT ATGAAACTAT GTACGAAAAG CACTGCCATT AATTCCTGAG AACTAAATTA 19560 

CTGATTGGTG TCAAAAAGGA AAACTTCCAA GCGATGATAT CCTGTCTATA CACGACCTAT 19620 

AGAAATCTGT AATATACATA TCCGTAAAAC GATAAATTCC CTTTTTGATT TTAAATGAGT 19680 

ATGAAAAGAG AATTTTTTGG CTCTTTGTCA ACTGTAGTGG GTTGAAGAAA AGCTAAGCTC 19740 

GAGAAAGGAC AAATTTCATC CTTTCTTTTT TGATATTCAG AGCGATAAAA ATCCGTTTTT 19800 

TGAAGTTTTC AAAGTTCCGA AAACCAAAGG CATTGCGCTT GATAAGTTTG ATGAGATTAT 19860 

TGGTCGCTTC CAGTTTGGCG TTAGAATAGT GTAGTTGAAG GGCGTTGATA ATCTTTTCTT 19920 

TATCTTTGAG GAAGGTTTTA AAGACAGTCT GAAAAATAGG ATGAACCTGC TTAAGATTGT 19980 

CCTCAATAAG TCCGAAAAAT TTCTCTGGTT CCTTATTCTG GAAGTGAAAA AGCAAGAGTT 20040 

GATAGAGCTG ATAGTGGTGT TTCAAGTCTT CCGAATAGCT CAAAAGCTTG TTTAAAATCT 20100 

CTTTATTGGT TAAGTGCATA CGAAAAATAG GACGATAAAA TCGCTTATCA CTCAGTTTAC 20160 

GGCTATCCTG TTGAATGAGT TTCCAGTAGC GCTTGATAG 20199 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 19702 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ACCCGATGTA TCAGCGGATA TTTACTCTAT TTTTCAAACG ATGTTATACC CACAATAAAA 60 

GAAAAAAGAC CCTAAGGTCT CCTTTGCTTT TATTATTAAA CGCGTTCAAC TTTACCTGAT 120 

TTCAAAGCAC GAGCTGAAGC CCAAACTTTT TTAGGTTTAC CATCGATAAG AACAGTAACT 180 

TTTTGAAGGT TTGGTTTTAC GGCACGTTTT GTTTGGTTCA TCGCGTGTGA ACGGTTGTTT 240 

CCTGATACAG TCTTACGACC TGTAAAGTAA CATACTTTAG CCATTGTGTT TTCCTCCTAT 300 

TAGATCTAAT ATAGCGGATG TGCTAGCACC ACATACCGTA CTATGTTATC ACATTTTCTT 360 

GTTTTTTGCA AGGGAATTGG AAGATTTTTT ATTTGTGTCT TAAATCAGGT CTTGCGTGAC 420 

ATTTcTGCTC TCCACATGCC ATCGTTGATT AACAGAACAC CAGAATTAAA ATTATGTGTA 480 

TAAAAATCAT CTCTAACTGC AGCTAAGGGT ATAGCCGTCA AGTCCAAATC CCACAGCTCA 540 
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TCTATCGATT TTCTTACAAC AATATCTGAA TCCAAATACA GTACACGAGA CTCGCTTACA 600 

TACTTTGGAA TAAAATACCT AAAAAAGCCG CATATGAAAG TCCCTCAAAG GGGAGACGAT 660 

AACCTTTCAG AATATTACTG TCAATCTAAA CATTCACAAT CTCACTATTC AAAGTCTCTA 720 

GTCTTTTTTC CATCAATTGG AACCATTCTC GCGGAAGGTC ATCATTAAAA ACATAAAACT 780 

TAAGATTATA ATGATGAACA CAAAGAGATT TTATTGTTGT TTCAACTTTA TCCATATAAG 840 

CATTATCTGC ACCTAAGACA ATCGCTTTTT TCTCTTCTTT CACTTTTTAT CTCATTTCTT 900 

TTTATTCCCA TCATATTATT CCCATCATAT GTTTCCCATC ATATGTTTCT ACGTAACCAT 960 

TATTTTCGCC TATTCGTTCG TAAAACCATA CCAGTGGAGA TTTTAGATGA AGTCCCATTA 1020 

CGGTTTACAA TTTTTACATT ACGACACGGA GTTTTACAAA TCGATTTCAT TTGCCAAACG 1080 

TAGTTAGTGA GGCAGTTAGC TAGTTCGCCA AATAGCGACT AGCGTCCAAC AATTTGGAAC 1140 

TTTAGTTCCA ATTGTTGGTA CTGAGTCACA TCTTCTCCTC TAACTCTACG TCTGGATACT 1200 

TGTCCGCAAA CCAGCGGAGG GCAAAGTCAT TTTCAAAGAG AAAGACTGGT TGGTCAAAAC 1260 

GGTCTTTGGC TAAGATATTG CGACTTGACG ACATCCGTTC ATCCAAGTCC TCAGGCTTGA 1320 

TCCAACGAAC GGTCTTTTTA CCCATTGGGT TCATAACTAC TTCCGCATTG TACTCGCCTT 13 BO 

CCATGCGGTG TTTAAAGACT TCAAACTGGA GTTGACCTAC AGCGCCTAGC ATGTACTCAC 1440 

CTGTTTGGTA ATTCTTATAA AGCTGAACGG CTCCTTCTTG CACCAATTGC TCAATCCCCT 1500 

TGTGGAAGGA TTTTTGCTTC ATAACATTCT TAGCAGAAAC TTTCATGAAA ATCTCAGGTG .1560 

TAAAGGTTGG CAGGGGTTCA AATTCAAACT TGTTTTTTCC AACCGTCAAG GTATCCCCAA 1620 

CCTGATAAGT ACCGGTATCG TAAACCCCGA TAATATCACC TGCCACGGCA TTGGTCACAT 1680 

TCTCACGACT CTCCGCCATA AACTGGGTAA CATTAGATAG TTTAGCCCCC TTACCAGTAC 1740 

GAGGGAGATT GACACTCATG CCGCGCTCAA ATTCGCCAGA TACGATACGG ACAAAGGCAA 1800 

TACGGTCACG GTGACGAGGG TCCATGTTGG CTTGGATTTT AAAGACAAAG •CCTGAGAAAT 1860 

CCTTGTCATA AGGATCCACA ATTTCACCGT CTGTTTTCTT GTGACCATGT GGTTCTGGAG 1920 

CAAACTTGAG GAAGGTTTCA AGGAAGGTCT GCACACCAAA GTTTGTCAGG GCTGAACCGA 1980 

AAAAGACAGG CGTCAATTCT CCAGCCAGAA TAGCTTCCTC TGAAAACTCA TTCCCGGCTT 2040 

CATTTAAAAG CTCAATGTCA TCCTTGACTT GCTCGTAGAA AGGATTGCTA CCAAAGAGTT 2100 

TGTCCCCGTC TTCTAGACTG GCAAAACGCT CATCCCCTTT GTAAAGCTCT AAACGTTGGT 2160 

TATAGAGGTC ATACAAGCCC TCAAAGGCTT TCCCCATCCC GATAGGCCAG TTCATAGGGT 2220 

AGCTAGCAAT GCCCAAGATT TCTTCCAATT CTTGCAAGAG ATCCAAAGGC TCACGACCGT 2280 
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CACGGTCCAG CTTGTTCATA AAGGTAAAGA CTGGAATGCC ACGATGTTTC ACAACCTCAA 2340 

ACAATTTCTT GGTTTGAGCC TCGATCCCCT TGGCAGAGTC CACGACCATG ACCGCAGCAT 2400 

CCACCGCCAT CAAGGTACGA TAGGTATCTT CTGAGAAGTC CTCGTGCCCT GGCGTGTCTA 2460 

AGATATTCAC GCGCTTGCCG TCGTAGTCAA ATTGCATAAC AGATGAAGTA ACAGAAATCC 2520 

CACGTTGCTT CTCGATATCC ATCCAGTCAG ATTTAGCAAA AGTCCCTGTT TTCTTCCCTT 2580 

TTACCGTACC AGCCTCACGA ATCTCACCCC CAAAGTAGAG TAACTGCTCA GTGATGGTTG 2640 

TTTTCCCCGC GTCCGGGTGG GAGATAATGG CAAAGGTACG ACGTTTCTTA ATTTCTTCTT 2700 

GAATATTCAT AAGTTCTCTT TCTTTGATTC TCTATTTTTC TTGTTTCAAT AGCTGAGAAT 2760 

GATTTTTACA TTGGATTTTA CCATTCCTTT CAACACTCCA TTATATCGGA TTTTAGCATT 2820 

TTTTTCAATT TCTATTTCTT TTCACTTCCC CCTCCCTTAT TTATAGGAAA ATATGGTAAA 2880 

ATAGAACAGA CTAAAAATCA TCATTTCACG AAAGGATGCA AGATGAAAAT TACGCAAGAA 2940 

GAGGTAACAC ACGTTGCCAA TCTTTCAAAA TTAAGATTCT CTGAAGAAGA AACTGCTGCC 3000 

TTTGCGACCA CCTTGTCTAA GATTGTTGAC ATGGTTGAAT TGCTGGGCGA AGTTGACACA 3060 

ACTGGTGTCG CACCTACTAC GACTATGGCT GACCGCAAGA CTGTACTCCG CCCTGATGTG 3120 

GCCGAAGAAG GAATAGACCG TGATCGCTTG TTTAAAAACG TACCTGAAAA AGACAACTAC 3180 

TATATCAAGG TGCCAGCTAT CCTAGACAAT GGAGGAGATG CCTAATGACT TTTAACAATA 3240 

AAACTATTGA AGAGTTGCAC AATCTCCTTG TCTCTAAGGA AATTTCTGCA ACAGAATTGA 3300 

CCCAAGCAAC ACTTGAAAAT ATCAAGTCTC GTGAGGAAGC CCTCAATTCA TTTGTCACCA 3360 

TCGCTGAGGA GCAAGCTCTT GTTCAAGCTA AAGCCATTGA TGAAGCTGGA ATTGATGCTG 3420 

ACAATGTCCT TTCAGGAATT CCACTTGCTG TTAAGGATAA CATCTCTACA GACGGTATTC 3480 

TCACAACTGC TGCCTCAAAA ATGCTCTACA ACTATGAGCC AATCTTTGAT GCGACAGCTG 3540 

TTGCCAATGC AAAAACCAAG GGCATGATTG TCGTTGGAAA GACCAACATG GACGAATTTG 3600 

CTATGGGTGG TTCAGGTGAA ACTTCACACT ACGGAGCAAC TAAAAACGCT TGGAACCACA 3660 

GCAAGGTTCC TGGTGGGTCA TCAAGTGGTT CTGCCGCAGC TGTAGCCTCA GGACAAGTTC 3720 - 

GCTTGTCACT TGGTTCTGAT ACTGGTGGTT CCATCCGCCA ACCTGCTGCC TTCAACGGAA 3780 

TCGTTGGTCT CAAACCAACC TACGGAACAG TTTCACGTTT CGGTCTCATT GCCTTTGGTA 3840 

GCTCATTAGA CCAGATTGGA CCTTTTGCTC CTACTGTTAA GGAAAATGCC CTCTTGCTCA 3900 

ACGCTATTGC CAGCGAAGAT GCTAAAGACT CTACTTCTGC TCCTGTCCGC ATCGCCGACT 3960 

TTACTTCAAA AATCGGCCAA GACATCAAGG GTATGAAAAT CGCTTTGCCT AAGGAATACC 4020 

TAGGCGAAGG AATTGATCCA GAGGTTAAGG AAACAATCTT AAACGCGGCC AAACACTTTG 4080 
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AAAAATTGGG TGCTATCGTC GAAGAAGTCA GCCTTCCTCA CTCTAAATAC GGTGTTGCCG 4140 

TTTATTACAT CATCGCTTCA TCAGAAGCTT CATCAAACTT GCAACGCTTC GACGGTATCC 4200 

GTTACGGCTA TCGCGCAGAA GATGCAACCA ACCTTGATGA AATCTATGTA AACAGCCGAA 4260 

GCCAAGGTTT TGGTGAAGAG GTAAAACGTC GTATCATGCT GGGTACTTTC AGTCTTTCAT 4320 

CAGGTTACTA TGATGCCTAC TACAAAAAGG CTGGTCAAGT CCGTACCCTC ATCATTCAAG 4380 

ATTTCGAAAA AGTCTTCGCG GATTACGATT TGATTTTGGG TCCAACTGCT CCAAGTGTTG 4440 

CCTATGACTT GGATTCTCTC AACCATGACC CAGTTGCCAT GTACTTAGCC GACCTATTGA 4500 

CCATACCTGT AAACTTGGCA GGACTGCCTG GAATTTCGAT TCCTGCTGGA TTCTCTCAAG 4560 

GTCTACCTGT CGGACTCCAA TTGATTGGTC CCAAGTACTC TGAGGAAACC ATTTACCAAG 4620 

CTGCTGCTGC TTTTGAAGCA ACAACAGACT ACCACAAACA ACAACCCGTG ATTTTTGGAG 4680 

GTGACAACTA ATGAACTTTG AAACAGTCAT CGGACTTGAA GTCCACGTAG AGCTCAACAC 4740 

CAATTCAAAA ATCTTCTCAC CTACTTCTGC CCACTTTGGA AATGACCAAA ATGCCAACAC 4800 

TAACGTGATT GACTGGTCTT TCCCAGGAGT TCTACCAGTT CTCAATAAAG GGGTTGTTGA 4860 

TGCCGGTATC AAGGCTGCTC TTGCCCTCAA CATGGACATC CACAAAAAGA TGCACTTTGA 4920 

CCGCAAGAAC TACTTCTATC CTGATAACCC CAAAGCCTAC CAAATTTCTC AGTTTGATGA 4980 

ACCAATCGGA TATAATGGCT GGATTGAAGT CAAACTAGAA GACGGTACGA CCAAGAAAAT 5040 

CGGTATCGAA CGTGCCCACC TAGAGGAAGA CGCTGGTAAA AACACCCATG GTACAGATGG 5100 

CTACTCTTAT GTTGACCTCA ACCGCCAAGG GGTTCCCTTG ATTGAGATTG TATCTGAGGC 5160 

AGATATGCGT TCTCCTGAAG AAGCCTATGC TTATCTGACA GCCCTCAAGG AAGTTATCCA 5220 

GTACGCTGGC ATTTCTGACG TTAAGATGGA GGAAGGTTCG ATGCGTGTGG ATGCCAACAT 5280 

CTCCCTTCGT CCTTATGGTC AAGAGAAATT CGGTACCAAG ACTGAATTGA AGAACCTCAA 5340 

CTCCTTCTCA AACGTTCGTA AAGGTCTTGA ATACGAAGTC CAACGCCAGG CTGAAATTCT 5400 

TCGCTCAGGT GGTCAAATCC GCCAAGAAAC ACGCCGTTAC GATGAAGCGA ATAAAGCAAC 5460 

CATCCTCATG CGTGTCAAGG AAGGGGCTGC TGACTACCGC TACTTCCCAG AACCAGACCT 5520 

ACCCCTCTTT GAAATTTCTG ACGAGTGGAT TGAGGAAATG CGGACTGAGT TGCCAGAGTT 5580 

TCCAAAAGAA CGTCGTGCGC GTTATGTATC TGACCTTGGT TTATCAGACT ACGATGCTAG 5640 

TCAGTTGACT GCTAATAAAG TCACTTCTGA CTTCTTTGAA AAAGCTGTTG CCCTAGGTGG 5700 

TGATGCCAAA CAAGTCTCTA ACTGGCTCCA AGGGGAAGTC GCTCAGTTCT TGAATGCTGA 5760 

AGGTAAAACA CTGGAACAAA TCGAATTGAC ACCAGAAAAC TTGGTTGAAA TGATTGCCAT 5820 
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CATCGAAGAC GGTACTATTT CATCTAAGAT TGCCAAGAAA GTCTTTGTCC ATCTAGCTAA 5880 

AAATGGCGGT GGCGCGCGTG AATACGTGGA AAAAGCAGGT ATGGTTCAAA TTTCAGATCC 5940 

AGCTATCTTG ATCCCAATCA TCCACCAAGT CTTTGCCGAT AACGAAGCTG CTGTTGCCGA 6000 

CTTCAAGTCA GGCAAACGTA ACGCCGACAA GGCtTTACAG GATTCCTTAT GAAGGCAACC 6060 

AAAGGCCAAG CCAACCCACA AGTTGCCCTT AAACTACTTG CACAGGAATT GGCGAAGTTG 6120 

AAAGAAAACT AGACAGAACA AAACCAGCCC TAAGGTTGGT TTTTTCTTCT CTACCAACTC 6180 

CCAATAACTA TTTTGGCTTT ATTTCCAGAG TATTTTATGG TAAAATGAAG AGTAATAATA ,6240 

TTTATTAAAG AGGTAAAAAC ATGATTGAAG CAAGTACCTT AAAAGCTGGT ATGACCTTTG 6300 

AAACAGCTGA CGGCAAATTG ATTCGCGTTT TGGAAGCTAG TCACCACAAA CCAGGTAAAG 6360 

GAAACACGAT CATGCGTATG AAATTGCGTG ATGTCCGTAC TGGTTCTACA TTTGACACAA 6420 

GCTACCGTCC AGAGGAAAAA TTTGAACAAG CTATTATCGA GACTGTCCCA GCTCAATACT 6480 

TGTACAAAAT GGATGACACA GCATACTTCA TGAATACAGA AACTTATGAC CAATACGAAA 6540 

TCCCTGTAGT CAATGTTGAA AACGAATTGC TTTACATCCT TGAAAACTCT GATGTGAAAA 6600 

TCCAATTCTA CGGAACTGAA GTGATCGGTG TCACCGTTCC TACTACTGTT GAGTTGACAG 6660 

TTGCTGAAAC TCAACCATCT ATCAAAGGTG CTACTGTTAC AGGTTCTGGT AAACCAGCAA 6720 

CGATGGAAAC TGGACTTGTC GTAAACGTTC CAGACTTCAT CGAAGCAGGA CAAAAACTCG 6780 

TTATCAACAC TGCAGAAGGA ACTTACGTTT CTCGTGCCTA ATCTCTAGAA AGAGGTCATT 6840 

CTATGGGAAT TGAAGAACAA CTTGGCGAAA TCGTTATCGC CCCACGTGTA CTTGAAAAAA 6900 

TCATTGCTAT CGCTACTGCA AAGGTAGAGG GTGTTCACTC TTTTTCAAAC AGATCAGTGT. 6960 

CTGATACCCT TTCAAAACTT TCACTCGGCC GTGGCATTTA TCTTAAAAAC GTGGACGAAG 7020 

AACTCACAGC AGATATCTAT CTCTACCTTG AGTACGGAGT AAAAGTTCCT AAGGTAGCGG 7080 

TTGCTATCCA GAAAGCTGTC AAAGATGCCG TCCGTAATAT GGCTGATGTA GAACTCGCTG 7140 

CTATCAATAT TCACGTTGCA GGTATCGTCC CAGATAAAAC ACCAAAACCA GAATTGAAAG 7200 

ATCTATTTGA CGAGGACTTC CTCAATGACT AGTCCACTAT TAGAATCTAG ACGCCAACTC 7260 

CGTAAATGCG CTTTTCAAGC TCTCATGAGC CTTGAGTTCG GTACGGATGT CGAAACTGCT 7320 

TGTCGTTTCG CCTATACTCA TGATCGTGAA GATACGGATG TACAACTTCC AGCCTTTTTG 7380 

ATAGACCTCG TTTCTGGTGT TCAAGCTAAA AAGGAAGAAC TAGATAAGCA AATCACTCAG 7440 

CATTTAAAAG CAGGTTGGAC CATTGAACGC TTAACGCTCG TGGAGAGAAA CCTCCTTCGC 7500 

TTGGGAGTCT TTGAAATCAC TTCATTTGAC ACTCCTCAGC TGGTTGCTGT TAATGAAGCT 7560 

ATCGAGCTTG CAAAGGACTT CTCCGATCAA AAATCTGCCC GTTTTATCAA TGGACTGCTC 7620 
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AGCCAGTTTG TAACAGAAGA ACAATAAGGC TCTTTGTCAA CTGTAGTGGG TTGAAAAAAA 7680 

GCTAAGCTCG AGAAAGGACA AATTTCGTCC TTTCTTTTTT GATGTTCAAA GCGATAAAAA 7740 

TCCGTTTTTT GAAGTTTTCA AAGTTTCGAA AACCAAAGGC ATTGCGCTTG ATAAGTTTGA 7800 

TG AG ATT ATT GGTCGCTTCC AGTTTGGCAT TAGAATAGTG TAGTTGAAGG GCGTTGACAA 7860 

TCTTTTCTTT ATCTTTGAGG AAGGTTTTAA AGACAGTCTG AAAAATAGGA TGAGCCTGCT 7920 

TAAGATTGTC CTCAATAAGT CCGAAAAATT TCTCTGGTTC CTTATTCTGG AAGTGAAACA 7980 

GCAAGAGCTG ATAGAGCTGA TAGTGGTGTT TCAAGTCTTG TGAATGGCTC AAAAGCTTGT 8040 

CTAAAATCTC TTTATTGGTT AAGTGCATAC GAAAAGTAGG ACGATAAAAT CGCTTATCAC 8100 

TCAGTCTACG GCTATCCTGT TGAATGAGTT TCCAGTAGCG CTTGATATCC TTGTATTCAT 8160 

GGG ATTTTCG ATGAAACTGA TTCATGATTT GGACACGCAC ACGACTCATG GCACGGCTAA 8220 

GATGTTGTAC AATGTGAAAG CGATCAAGAA CGATTTTAGC ATTCGGGAGT GAAACAGTCT 8280 

GGGAGACTGT TTCAGCCTGA GCCTAGGAAT TTGAAAGCGA AGCTGTTTAG CCAAGTCATA 8340 

GTAAGGGCTA AACATATCCA TAGTAATAAT TTTGACGCGA CATCGGACAA CTCTATCGTA 8400 

GCGAAGAAAG TGATTTCGAA TGATAGCTTG TGTTCTACCC TCAAGAACAG TGATGATATT 8460 

GAGATTGTTA AAATCTTGCG CAATGAAGCT CATCTTTCCC TTTGTAAAAG CATACTCATC 8520 

CCAAGACATA ATCTCAGGAA GACAAGAAAA ATCATGTTTA AAGTGAAAAT CATTGAGCTT 8580 

ACGAATAACA GTTGAAGTTG AGATGGAAAG CTGATGGGCA ATATCAGTCA TAGAAATCTT 8640 

TTCAATCAAC TTTTGAGCAA TCTTTTGGTT GATGATACGA GGGATTTGGT GATTTTTCTT 8700 

GACGATAGAA GTTTCAGCGA CCATCATTTT TGAACAGTGA TAGCACTTGA ATCGACGCTT 8760 

TCTAAGGAGA ATTCTAGTAG GCATACCAGT CGTTTCAAGA TAAGGAATTT TAGAAGGTTT 8820 

TTGAAAGTCA TATTTCTTCA ATTGGTTTCC GCACTCAGGG CAAGATGGGG CGTCGTAGTC 8880 

CAGTTTGGCG ATGATTTCCT TGTGTGTATC CTTATTGATG ATGTCTAAAA TCTGGATATT 8940 

AGGGTCTTTA ATGTCTAGTA ATTTTGTGAT AAAATGTAAT TGTTCCATAT GAATCTTTCT 9000 

AATGAGTTGT TTTGTCGCTT TTCATTATAG GTCATATGGG ACTTTTTTTC TACAATAAAA 9060 

TAGGCTCCAT AATATCTATA GGGGATTTAC CCACTACAAA TATTATAGAG CCAACAATAA 9120 

AAAGAAAAAG TGTTTGATAG ATATCAAACA CTTTTTTCTT TGCCTCCCAC TATCTAAAAA 9180 

AATGATAATA GATATAATTG TAAACAAAAA TCCAGATAGG TTTTGCATGA TTGAGAAAGT 9240 

TAAAAAAACT ATGGCAGAGA ATCGTTAATC TCAGATTGTC GGTAGAACGA TAAACAAGGG .9300 

CAAAAAAGAA ACCAATCAGA CTATAATATA ATAAACTAAT TGGATCTCTG TGAGATAGTA 9360 
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TCAAATGGCT AATCCCAAAG ATGATAGCAG ATAGGATAAC ATCCAAATAG TACTTGGACT 9420 

AGGGAAAGAA GGTATTCATA AAATACCCTC TATCAAGAGT CTCCTCAAAA ACAGGACCGA 9480 

TGATTACAGG CAGGACAAAA GATAAGATAG TCGATAAAAA GGTTGGTTGT CCATTTGAAA 9540 

AAAGCACGGT AAAATACTCA TCATGAATAT TCCTATGATT AATCAAATGA GCATAGCGTG 9600 

CCCAAAAATT ACCGAGAATC TGATAAACCA CATAAGTTGC AAATAAGTAG AAGACAAATG 9660 

ACCAGTTCCA GCTCTTTTTC TCAAAGATAA AGAGCATCTT TTTCTTTTTT AACCTCCAAA 9720 

TTAATAGAAG GAAACTTCCC ACTAATCCCA TTGTTAAAAT AAGAGAATAG ACATCAGCTC 9780 

CTAACCCTAA AATGATCGTC ACATACAATC CAATTGTTTG TGGTAAATAG GTAGATAGTA 9840 

AAATAATAAG CAAAAATATT CCAAATTGTC TTAGTTTTTT TGTGTTTCTC ATCGTACTTT 9900 

TTTGAAAGAT TACCCTGCTC GGAAGCCGTA CTTCCAAGCA TCTATATAAG AATTAAGTGC 9960 

CCCTTGCCTC ATATAGGGAG CAAATTCTCT ATAATATAAC CATCTACTAT ATCCATCTTC 10020 

CCAAACAGCA AGACCACCTG AAGTTTGCTC CAAGTCCTCA GTTGAAAGAA CTGTAAATGT 10080 

ATTTGTACCT GTCATTGCAA GTACCTTCTT AAAATAGATT GTTGTAGGCT CACATTTATA 10140 

GTATATTTCT TTTTTTGTCT ATTTTATAGC CCATCTCCTC AACTGGCAAT TTTTCGACCT 10200 

GAATTACATT TTTCCATAAA AAATGAGACC TTTCTAGTCT CATTTAGTCA TTCTTAGTAT 10260 

TTTCTAAATC GTTGATAGCG TTCTTCCAGC AACTCTTCTA GCGGTTTTTG TGAAAGTCTA 10320 

GCCAGCTCCG TTTGGAGTTC TTTTTTGACA CTCTTAATCA GTTCTTTACT AGAAAGTCCT 10380 

ATTTCAGAAA TCACCTTATC CACCACGTCC ATTTCTAACA GTTCATGCGA AGTGATTTTC 10440 

ATCAGTTCTG CTGCTTCCAT AGCGCGAGTA CCGTCCTTCC ATAAAATGGA AGCAAAGCCT 10500 

TCTGGACTGA GAATGGCATA GATAGAATTT TCCAGCATCC AGACACGGTC CGCGACAGCT 10560 

AGAGCCAGAG CCCCGCCTGA ACCACCTTCA CCGATAATAA TGGCGATAAT AGGAACTTTC 10620 

AGGTCACTCA TTTCCATGAG ATTGCGAGCG ATAGCTTCCC CTTGACCACG TTCTTCCGCT 10680 

CCGACACCAG GATAAGCACC TGCTGTATTG ATAAAGGTCA CAACTGGACG GCCAAATTTC 10740 

TCAGCCTGTT TCATCAACCG CAGTGCCTTT CGGTAGCCTT CTGGATGTGG TTGGCCAAAA 10800 

TTCCGTTTGA GGTTGTCTTG CAAACTCTTG CCTTTTTGGA TACCAACCAC TGTTACAGCT 10860 

TGGTCTCCAA GCCAACCAAT ACCACCAACA ACTGCACCAT CATCACGAAA AGAACGGTCA 10920 

CCATGTAATT GGATAAATTC ATCAAAAATG CCTGTCGCAA AGTCCAAGGT TGTCAAGCGA 10980 

CTCTGCTCAC GCGCTTCTCT GACTATTTTT GCAATATTCA TCTAGGACTC CCTCCATGCA 11040 

ATCTGACTAG GCTAGCAATC GTATCTGGTA AGTCTCTTCT TTTGACAATA GCATCCACAA 11100 

AGCCATGTTC TAATAGGAAT TCTGCCTTTT GGAAATCCTC AGGCAAGCTT TCACGAACCG 11160 
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TATTTTCAAT CACACGACGC CCAGCAAAAC CAACCAAGCT CTGTGGTTCA GCCAGAATGA 11220 

TATCGCCTTC CATAGCGAAA GAAGCTGTCA CACCACCAGT CGTTGGATCT GTCAAAATGG 11280 

TCAGGTAAAA GAGACCAGCA TTTGAATGGC GTTTAACCGC CGCAGAGATC TTAGCCATCT 11340 

GCATGAGACT CATGATTCCT TCCTGCATAC GGGCTCCACC AGAGGCTGTG AATAGGACAA 11400 

CTGGCAATTT TTCGACAGTC GCATACTCAA ACAAACGAGT GATTTTTTCA CCTACAACCG 11460 

TACCCATAGA AGCCATGATA AAGTTAGAAT CCATAATCCC AAGAGCCACA GTCTGACCTT 11520 

TAATAAGAGC AGTTCCTGTC ACAACGGCTT CATGCAGACC TGTTTTTTCA CGCATAGATG 11580 

CCAGTTTCTT TTGGTAACCA GGGAAATGCA AGGGATCCTT GCTTTCAATC CCTGTAAACA 11640 

ATTCTTTGAA GGTTCCCATA TCAATCGTCA AAGCCAAGCG TTCTTGGGCA GAAATACGAA 11700 

AGGTATAGCT ACAGTGCGGA CAGATACGTT CACTTCCCAG ATCCTTCTGA TAGATGGTAT 11760 

GCTTACAGCC TGGACACTGG GAAAATAATT CATCTGGAAC CTCTGGCTTA GCTTGAGGTT 11820 

TTTCCCTAAC CGAACGATTG GGATTGATTC GAATATACTT ATCTTTTTTA CTAAATAGAG 11880 

CCATTGATTC CCCTTTTCGG TTTAAACTCT TAAAGTCATT TTATTCTTTT TCTTGATATT 11940 

TAGGTAAGAA GGTTTCCATC AAGAAGGAAG TATCATAATC CCCAGCAATG ACATTGCGAT 12000 

CTGAAATGAG GTCAAGCTGG AAATCTGCAT TGGTCTGCAC TCCTTCAATT TCTAATTCAT 12060 

AGAGGGCACG TTGCATTTTC ATCAAGGCGT CAAAACGATT TTCGCCGTGT ACTATGATTT 12120 

TGGCAATCAT ACT AT CAT AA TAAGGCGGAA TGGTATAACC TGGATAAACT GCTGAATCCA 12180 

CGCGCAAGCC AACTCCACCA CTTGGCAGAT AGAGATTAGT AATCTTACCT GGACTTGGAG 12240 

CAAAGTTAAA GGCTGGGTTT TCTGCATTGA TACGACACTC GATGGCATGA CCGCGTAGGA 12300 

CAATATCTTC TTGCTTAACA GACAAAGGCT GACCTGCCGC AATGCAAATC TGTTCCTTAA 12360 

CGATATCAAC ACCTGAAACA AACTCTGTTA CTGGATGTTC TACCTGAACA CGAGTATTCA 12420 

TCTCCATGAA ATAGAAATTG CTACTTGCTT CATCAAGAAG AAATTCAATG GTTCCTGCAT 12480 

TCTCATAGCC AACAAACTCT GCCGCTOGAA CAGCAGCAGC ACCTATTTCA TGACGCAGCG 12540 

TTTTTCCGAT TGCAATCGAG GGACTTTCTT CCAAAACCTT TTGGTTATTC CTTTGAAGAG 12600 

AACAATCCCG TTCACCCAAG TGAATCACAT GTCCATGCTC ATCACCTAGG ATTTGAACCT 12660 

CAATGTGCCG AGCTGGATAG ATAACCCGTT CTATGTACAT GGCACCATTG CCATAATTGG 12720 

CCTTGGCCTC ACTAGAGGCA GTTTCAAAGG CAGAAACGAG GTC ATCTGGT TTTTCAACCT 12780 

TACGAATCCC TTTACCACCT CCACCTGCTG AAGCCTTGAG CATAACAGGA TAGCCAATTT 12840 

TTTCAGCAAC AATCAAAGCT TCTTCAGAGT TATGCACTTC TCCATCTGAA CCTGGTATAA 12900 
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CAGGCACACC TGCTTTAATC ATCTGAGCAC GCGCATTGAT CTTATCCCCC ATCATATCCA 12960 

TAACATGACC AGATGGACCG ATAAACTTGA TACCTACTTC TTCACACATG GTCGCAAATT 13020 

TGGAATTTTC ACTGAGAAAT CCAAAACCAG GGTGAATAGC TTCTGCCTCA GTCAAGACTG 13080 

CAGCTGATAG AACTGCATTA ATATTGAGAT AAGACTCTGT TGCCTTGCCA GGACCAATAC 13140 
AAACTGCTTC ATCTGCCAAA AGCGTATGAA GAGCTTCCTT ATCAGCAGTT GAATAAACCG ' 13200 

CTACCGTCGC AATCCCCAAT TCACGTGCCG CACGGATAAT ACGAACCGCA ATTTCACCAC 13260 

GATTGGCAAT TAAAATTTTT CGAAACATGG AGAACCTCCT TAGTTCCCAA TTGCAAAAGT 13320 

AAGGGTACCA CTGGCTGCAA GCTTGCCATC CACTTCAGCC TTTGCTTCAA CCACAGCTAT 13380 

GGTGCCACGA CGTTTTACAA AAGTCGCTGT CATAACCAAT TGGTCGCCTG GTACAACTTG 13440 

CTTCTTGAAC TTAACCTTGT CCATACCAGC GTAAAAGACC AGTTTTCCTT TATTTTCAGG 13500 

TTTTGATAAC TCCAACACAC CGGCAGTTTG CGCCAAGGCT TCCATAATCA CAACACCTGG 13560 

CATAACTGGG TATTGAGGAA AGTGGCCGTT AAAGAAAGGC TCGTTGATGG TCACATTTTT 13620 

GATAGCAACA ATGGTATCCT CGCTCACTTC CAAGACACGG TCCACTAGAA GCATAGGATA 13680 

ACGGTGGGGA AGAGCTTCTT TGATTCCTTG AATATCGATC ATTTGATACG TACCAATCCT 13740 

TTACCAAACT CAACCATTTC TTCGTTAGAG ACGAGAATTT CCGTTACCAC ACCATCCTTA 13800 

GGAGCTGGGA TTTCATTCAT GACTTTCATG GCTTCGATAA TTACCAATGT TTGACCTTTT 13860 

TTGACACTAT CACCAACTGT AACGAAGGCA GGTTTATCTG GTCCAGCAGC CAAGTAAACC 13920 

ACTCCAACAA GTGGACTCTC TACAAGATTT CCCTCAGTAG CCACACTTGC TTCAGCTGGA 13980 

GCTGGAACTT CTTCTGCTAC AGTCTCTGCT GGAGCAGATG TAGGAGCTAC TGGACTCGGT 14040 

GTTGCTAGAA CGGGTGCTGG AGCGACTTGA GTTGCAACTT CAGGCACAGG TCTTGCTTCA 14100 

TTCTTGCTAA ACTGCAACTC ATCCGTCCCA TTTTTATAAG AAAATTCTCT CAAACTTGAC 14160 

TGGTCAAATT GAGTCATCAA GTCTTTAATA TCGTTTAAAT TCATACTTAT CTATTCTCCC 14220 

AACGTTTGAA AGCAAGAACT GCATTGTGGC CTCCAAAACC AAAAGTATTT GAAATAGCGT 14280 

ATGGAATTTC TTTCTCCAAG CCTTGTCCAT AAACGACATT AGCTTCGATA TAATCTGATA 14340 

CTTCACTTGT CCCAGCTGTC ATTGGTACAA AGTTATGACG CATAGCTTCG ATGGTGACGA 14400 

TAGCTTCTAC TGCACCCGCA GCCCCCAGCA AATGTCCTGT AAAAGACTTG GTTGATGATA 14460 

CAGGTACTTC CTTACCAAGA ACAGCTACGA TAGCACCACT TTCTCCTTTT TCATTGGCAG 14520 

GAGTTGACGT TCCGTGAGCA TTGACATAGG CTACTTGCTC TGGAGAAATC TCAGCTTCTT 14580 

CCAAGGCTAG TTTGATGGCC TTGATAGCTC CCTGACCTTC TGGATGTGGA GAAGTCATGT 14640 

GGTAGGCATC ACAAGTATTT CCGTAACCAA CCACTTCAGC CAGGATAGTA GCTCCACGTT 14700 
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TTTCAGCGTG TTCAAGACTT TCTAGAACCA ACATCCCTGA ACCTTCACCC ATAACAAACC 14760 

CATTGCGATC CTTATCAAAT GGGATCGAAG CACGAGTTGG ATCCTCTGTA GTAGAGAGAG 14820 

CTGTTAAGGC TTGGAAACCA GCGATGGCAA AAGGTGTGAT AGAAGCTTCT GTTCCTCCCA 14B80 

CCAACATCAC ATCTTGGAAA CCAAACTTAA TGGAGCGGAA GGCATCCCCA ATCGCATCAT 14940 

TTGATGAAGA GCAGGCAGTA TTGATAGATT TACAAACACC GTTTGCACCA AAACGCATGG 15000 

CTACATTCCC AGAAGCCATA TTTGGTAAAG CTTTTGGAAG AGTCATTGGT TTGACACGTT 15060 

TGGGTCCTTT TTCATGAAGG CGAAGTACCT GATCTTCAAT TTCCTTGATT CCACCAATAC 15120 

CAGATGCAAC GATAACACCA AAACGATCCC TATTAAGAGC CTCTACATCA AGATTGGCAT 15180 

GATTTACAGC CTCTTGGGCT GCATACAAGG CATATAAAGA ATAGTTATCA AAACGGTTGG 15240 

TATCTTTTTT TACAAAGTAT TTATCGAACG GAAAATCTTG GATTTCTGCC GCATTATGCA 15300 

CATCAAAGTC ACTATGATCA AATTTTGTAA TGCCACCAAT GCCGATTTTC CCAGTTGCTA 15360 

AACTATTCCA AAATTCTTCT GGTGTATTTC CGATTGGAGA TGTTACTCCA TAACCTGTTA 15420 

CCACTACTCG ATTTAGTTTC ATTCTTTTCA CCTCTAGCTT TCGCTACATA CTTAAGCCAC 15480 

CATCAATGGC AACCACTTGT CCAGTTAGAT AATCTTGGCC TGCTAAAAAT ACTGTCAAAT 15540 

CTGCAACCTG CTCTGCCTGC CCAAATTCTT TCATCGGAAT CTGAGCTAGT GTAGCTTCCT 15600 

TAATCTTATC TGACAGGATA GCGGTCATAT CAGACTCAAT CATTCCTGGA GCAATCACAT 15660 

TGACTCGTAT ATTCCGACTA GCGACCTCGC GTGCCACAGA CTTGGTAAAG CCAATCAAGC 15720 

CAGCCTTAGA AGCAGCATAA TTAGCTTGAC CAATATTCCC CATCAAACCA ACAACACTAG 15780 

ACATATTAAT GATAGCACCT TCTCTGGCTT TCATCATCGG TTTCAAGACT GATTGTGTCA 15840 

TATTAAAGGC ACCAGTCAGA TTGACCTTGA GCACTTTTTC AAAATCTGCT TCTGTCATCT 15900 

TGAGCATAAG AGTATCTTGG GTAATCCCTG CATTGTTGAC CAAAACATCT ACTGAACCCA 15960 

GTTCTGCAAT AGCTTGATCA ATCATACGCT TAGCGTCTGC AAAATCTGAT ACATCTCCTG 16020 

AAATGGGAAC CACCTTGATA CCATAGTTTG AAAACTCAGC GAGCAATTCT TCTGAGATTG 16080 

CCCCACGACT GTTTAAGACA ATGTTGGCTC CTGCTTGAGC AAACTTGTGG GCGATGGCAA 16140 

GACCAATTCC ACGACTCGAA CCTGTAATAA AGATATTTTT ATGTTCTAGT TTCATTTTTT 16200 

TCCTTTCAAA ACTTCTACTT ATTTTAGTCT ATTTTTCTAA AAGTGCTACT AAACTCGCTT 16260 

GATCTTCCAC ATGAGCTAAG TGAGCAGTTT GATCAATTTT TTTAACAAAA CCTGACAAGA 16320 

CTTTCCCCGG TCCAATCTCG ATAAAGTTGC TTATGCCTGC TTCTTGCATG ACCCCAATAC 16380 

TTTCATAGAA ACGAACGGGT TCCTTGACCT GACGCGTCAA GAGCTGAGCA ATGTCCTCTT 16440 
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TTTGCATCAC AGCAGCTTCT GTATTGCCGA CTAGGGGACA AGTAAAATCT GAAAAACTTA 16500 

CCTGAGCTAG AGTTTCAGCT AGTTTCTGGC TAGCAGGTTC AAGGAGAGCG GTGTGAAAGG 16560 

GACCTGACAC CTTAAGAGGA ATCAAGCGTT TGGCACCTGC TTCTTGCAAA AGTTCAACCG 16620 

CTCGATCAAC TGCAACCACT TCTCCAGCAA TGACGATTTG TGCAGGTGTG TTATAGTTGG 16680 

CTGGAGTAAC CACTCCAAGT TCAGAAGCTT TTTGACAGGC TTCTTCAATG ACCTCTACTG 16740 

GCGTATTGAG AACTGCTACC ATCTTGCCAG AGTCAGCAGG AGCCGCTTCT TCCATATAGG 16800 

CTCCACGCTT AGCTACCAAG GCAACCGCAT CTTCAAAATC CAAGGCGCCA CTTGCCACCA 16860 

AGGCAGAGTA TTCTCCAAGA GACAAACCAG CAACCATATC AGGCTGATAG CCCTTTTCTT 16920 

GCAATAAACG GTAGATAGCA ACCGAAGTCG CTAGAATGGC TGGTTGCGTA TAGCGGGTCT 16980 

GATTGAGTTT GTCTTCTTCC GTATCGATGA GATAACGCAA ATCATAACCG AGCACCTGGC 17040 

TCGCTCGATC AATCGTTTCT TTAACAATCG GATACTGATC ATAGAAATCC CGTCCCATCC 17100 

CTAGATACTG GGCACCTTGA CCAGCAAATA AAAAGGCTGT TTTAGTCATT TCTTACAACT 17160 

CCTGTCCAGC GAGAGGCTTC TTCTTGAATT TTCTTAGCGG CTCCGTAATA CAAATCTTTT 17220 

AGGATTTCTT CAGCTGTTTC TTCTTTAGAA ACAAGCCCTG CGATTTGACC TGCCATAACA 17280 

GAGCCACCAT CCACATCACC GTGAACAACT GCTTTGGCTA GAGCACCTGC TCCCATTTGT 17340 

TCAAAGATTT CTAAATCAGG ATCTTCTTGC TTAAAGGCAT CTTTTTCAGC CAGTTCAAAA 17400 

TCTCTAGTCA ACTGATTTTT AATAGCACGA ACAGCATGAC CAAAGTGCTG AGCTGAAATC 17460 

GTAGTATCAA TATCCCTTGC TTTTAAAATT TTCTCCTTGT AGTTTGGATG GGCATTCGAC 17520 

TCTTTTGCAA CTACAAACCG TGTCCCCACC TGTACAGCCT CTGCACCTAG CATAAAGCCA 17580 

GCCGCAGCAC CTTCACCATC CGCAATTCCT CCTGCAGCAA TAACAGGAAT AGATATAGCT 17640 

GTGGCTACCT GTCGCACCAA GGTCATGGTT GTTAATTTAC CGATATGCCC CCCAGCTTCC 17700 

ATTCCTTCTG CAATAACAGC GTCTGCACCG ATTTTTTCCA TGCGTTTAGC TAAAGCGACA 17760 

CTAGGAACAA CAGGAATAAC GATTATCCCA GCTTCATGGA AACGTTCCAT ATACTTGCTT 17820 

GGATTTCCTG CTCCTGTTGT GACAACTTTA ACACCTTCTT CAATAACGAG ATCCACGATG 17880 

TCTTCCACAA AGGGAGATAA GAGCATGATG TTGACCCCAA AGGGTTTATC AGTCAATGAT 17940 

TTGATTTTAT CAATATTGGC CTTGACAACT TCTTTCGGGG CATTTCCCCC ACCGATAATT 18000 

CCTAATCCTC CAGCCTTGGA AACAGCCCCT GCCAAATCAC CATCAGCAAC CCAGGCCATC 18060 

CCTCCTTGGA AAATAGGATA ATCAATCTTC AATAATTCTG TAATACGCGT TTTCATAGTG 18120 

CCTCCAACCT TCCTTGCTTA CGTAATAGTT CGATTTCACC ATAATTTGAC AGTCAAACTA 18180 

TTACCTAAAC AAGAGGGAGT GGGTTTCTCC CTACTCCTTC TACTAATATT CTGCTTATTT 18240 
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TGCTTGCTCT 


TCAACGTAAG 


CAACCAAGTC 


ACCAACTGTT 


TTCAAGTCAT 


TTTCTGCTTC 


18300 


GATTTGGATA 


TCAAAAGCAT 


CTTCGATTTC 


TGAGATTACT 


TGGAACAAGT 


CCAATGAATC 


18360 


TGCGTCCAAA 


TCATCAAAAG 


TTGATTCAAG 


TGTTACTTCT 


GATGCGTCTT 


TTCCAAGTTC 


18420 


TTCAACGATA 


ATTTCTTGTA 


CTTTTTCAAA 


TACTGCCATG 


ATAGGACTCC 


TTTAAAATAA 


18480 


ATAGTTTTTT 


TATAACAATG 


TGTTCACCAC 


ATGATTACCT 


AAATTGTAAG 


AATGAGCGTG 


18540 


CCCCAGGTCA 


AGCCTCCACC 


GAAGCCTGAT 


AGAAGAACAG 


TCTGGCTACC 


ATCTAAAGGG 


18600 


ATGAGACCTT 


GTTCTACACA 


CTCTGAAAGT 


AAAATCGGGA 


TACTGGCTGC 


ACTGGTATTG 


18660 


CCATATTCCA 


TCATATTGGC 


TGGAAGTTTG 


GCTCGGTCAA 


CACCAATTTT 


TCTAGCCATC 


18720 


TTATCCAAAA 


TACGGTCATT 


GGCTTGATGA 


AGTAGCAGAT 


AATCCAAGTC 


TGTCACCTCT 


18780 


ATAGGAGATT 


CATCAATAGT 


CTGCTTGATA 


GACTTGGCTA 


CATCTCGAAT 


GGCAAAATCA 


18640 


AAGACTGTGC 


GTCCATCCAT 


CTTCAAAAAC 


GAATCTGCAC 


TTTCTTGATC 


TGAAAATGGA 


18900 


GAATGTAAAC 


CTGAATGCCC 


ATAAGTTAAA 


CACTCGCTGC 


GACTTCCATC 


GCTATTGAGA 


18960 


CTCTCAGCTA 


AGAAATGCTC 


TTGCTCGCTA 


GCTTCTAACA 


AGACACCACC 


AGCACCATCT 


19020 


CCAAACAACA 


CAGCTGTTGA 


TCGATCCGAC 


CAATCGACTG 


CCTTAGAGAG 


GGTTTCACTA 


19080 


CCAATCACCA 


AGCCTTTTTG 


AAAGCGACCA 


GAAGCGATAA 


ACTTTTCAGC 


AGTTGAAAGA 


19140 


GCAAATACAA 


ATCCACTGCA 


AGCCGOGGTT 


AAGTCAAAAG 


CAAAGGCTTT 


ATTAGCACCA 


19200 


ATATTAGCTT 


GAACACGAGC 


AGCTGTAGAG 


GGCATCATCG 


AATCTGGAGT 


AATGGTAGCT 


19260 


AGGATGATAA 


AATCCAGTTC 


TTCTCCTGTT 


ATTCCAGCTT 


TTGCCATCAG 


TTTCTTAGCA 


19320 


ACCTCTGTAG 


CCAAATCACT 


GGTAGATTCT 


GTTCTTGAAA 


TATGCCTTTG 


TCGTATTCCC . 


19380 


GTTCGACTTG 


AAATCCACTC 


ATCATTGGTA 


TCCATAATCT 


GAGCCAAGTC 


GTGATTTGTA 


19440 


ACCACTTGCT 


CTGGCACATA 


ATGAGCAACC 


TGACTTATTT 


TTGCAAAAGC 


CATTATTTCA 


19500 


AATCCTCCAA 


AAATTGGTAA 


AGATTAGTCA 


AACCTTTACC 


CATGACAGCA 


ATTTCTTCCT 


19560 


CGCTCATGCC 


ATCAATAATT 


TTTTCTACCA 


TGGCCTTGTG 


GAAGCGTTTA 


TGCAGTCTAT 


19620 


GAATCAAGCG 


ACCCTTCTTT 


GTCAAATGCA 


GATGCACCAC 


ACGACGATCC 


TGTTCTGACC 


19680 


GAACTCGCTC 


AATGTAGCCC 


GG 








19702 


(2) INFORMATION FOR SEQ ID NO: 8: 









U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6211 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GAAAATTTCC TCTCTTCTCT TG AAAAATTT TGAAAAAATG GTATGATAGT AACAAGTTAT 60 

TTTTAAGAGG AAAGAAAGGG GAATAATGGA GAAAATCAGT TTAGAATCTC CTAAGACGGG 120 

GTCGGACCTA GTTTTGGAAA CACTTCGTGA TTTAGGAGTT GATACCATCT TTGGTTATCC 180 

TGGTGGTGCG GTTTTGCCTT TTTATGATGC GATATATAAT TTTAAAGGCA TTCGCCACAT 240 ' 

TCTAGGGCGC CATGAGCAAG GTTGTTTGCA TGAAGCTGAA GGTTATGCCA AATCAACTGG 300 

AAAGTTGGGT GTTGCCGTCG TCACTAGTGG ACCAGGAGCA ACAAATGCCA TTACAGGGAT 3"60 

TGCGGATGCC ATGAGCGATA GCGTTCCCCT TTTGGTCTTT ACAGGTCAGG TGGCGCGAGC 420 

AGGGATTGGG AAGGATGCCT TTCAGGAGGC AGACATCGTG GGAATTACCA TGCCAATCAC 480 

TAAGTACAAT TACCAAGTTC GTGAGACAGC TGATATTCCG CGTATCATTA CGGAAGCTGT 540 

CCATATCGCA ACTACAGGCC GTCCAGGGCC AGTTGTAATT GACCTACCAA AAGACATATC 600 

TGCTTTAGAA ACAGACTTCA TTTATTCACC AGAAGTGAAT TTACCAAGTT ATCAGCCGAC 660 

TCTTGAGCCG AATGATATGC AAATCAAGAA AATCTTGAAG CAATTGTCCA AGGCTAAAAA 720 

GCCAGTCTTG TTAGCTGGTG GTGGAATTAG TTATGCTGAG GCTGCTACGG AACTAAATGA 780 

ATTTGCAGAA CGCTATCAAA TTCCAGTGGT AACCAGTCTT TTGGGACAAG GAACGATTGC 840 

AACGAGTCAC CCACTCTTTC TTGGAATGGG AGGCATGCAC GGGTCATTCG CAGCAAATAT 900 

TGCTATGACG GAAGCGGACT TTATGATTAG TATTGGTTCT CGTTTCGATG ACCGTTTGAC 960 

GGGGAATCCT AAGACTTTCG CTAAGAATGC TAAGGTTGCC CACATTGATA TTGACCCAGC 1020 

TGAGATTGGC AAGATTATCA GTGCAGACAT TCCTGTAGTT GGAGATGCTA AGAAGGCCTT 1080 

GCAAATGTTG CTAGCAGAAC CAACAGTTCA CAACAACACT GAAAAGTGGA TTGAGAAAGT 1140 

CACTAAAGAC AAGAATCGTG TTCGTTCTTA TGATAAGAAA GAGCGTGTGG TTCAACCGCA 1200 

AGCAGTTATT GAACGAATTG GTGAATTGAC GAATGGAGAT GCCATTGTGG TAACAGACGT 1260 

TGGTCAACAC CAAATGTGGA CAGCTCAGTA TTATCCCTAC CAAAATGAAC GTCAGTTAGT ,1320 

GACTTCAGGT GGTTTGGGAA CAATGGGCTT TGGAATTCCA GCAGCAATCG GTGCTAAAAT 1380 

TGCTAACCCA GATAAGGAAG TAGTCTTGTT TGTTGGGGAT GGTGGTTTCC AAATGACCAA 1440 

CCAGGAGTTG GCTATTTTGA ATATTTACAA GGTGCCAATC AAGGTGGTTA TGCTGAACAA 1500 

TCATTCACTT GGAATGGTTC GCCAGTGGCA GGAATCCTTC TATGAAGGCA GAACATCAGA 1560 

GTCGGTCTTT GATACCCTTC CTGATTTCCA ATTGATGGCG CAGGCTTATG GTATTAAAAA 1620 

CTATAAGTTT GACAATCCTG AGACCTTGGC TCAAGACCTT GAAGTCATCA CTGAGGATGT 1680 
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TCCTATGCTA ATTGAGGTAG ATATTTCTCG TAAGGAACAG GTGTTACCAA TGGTACCGGC 1740 

TGGTAAGAGT AATCATGAGA TGTTGGGGGT GCAGTTCCAT GCGTAGAATG TTAACAGCAA 1600 

AACTACAAAA TCGTTCAGGA GTCCTCAATC GCTTTACAGG TGTCCTATCT CGTCGTCAGG 1860 

TTAATATTGA AAGCATCTCT GTTGGAGCAA CAGAAGATCC GAATGTATCG CGTATCACTA 1920 

TTATTATTGA TGTTGCTTCT CATGATGAAG TGGAGCAAAT CATCAAACAG CTCAATCGTC 1980 

AGATTGATGT GATTCGCATT CGAGATATTA CAGACAAGCC TCATTTGGAG CGCGAGGTGA 2040 

TTTTGGTTAA GATGTCAGCG CCAGCTGAGA AGAGAGCTGA GATTTTAGCG ATTATTCAAC 2100 

CTTTCCGTGC AACAGTAGTA GACGTAGCGC CAAGCTCGAT TACCATTCAG ATGACGGGAA 21*0 

ATGCAGAAAA GAGCGAAGCC CTATTGCGAG TCATTCGCCC ATACGGTATT CGCAATATTG 2220 

CTCGAACGGG TGCAACTGGA TTTACCCGCG ATTAAAAATC CAACTTAAAT TTATTAAACC 2280 

AGCCTAAAAG GCAATAAATA ATAGAAAAGA GAGAAAAGCT ATGACAGTTC AAATGGAATA 2340 

TGAAAAAGAT GTTAAAGTAG CAGCACTTGA CGGTAAAAAA ATCGCCGTTA TCGGTTATGG 2400 

TTCACAAGGG CATGCGCATG CTCAAAACTT GCGTGATTCA GGTCGTGACG TTATTATCGG 24 60 

TGTACGTCCA GGTAAATCTT TTGATAAAGC AAAAGAAGAT GGATTTGATA CTTACACAGT 2520 

AGCAGAAGCT ACTAAGTTGG CTGATGTTAT CATGATCTTG GCGCCAGACG AAATTCAACA 2580 

AGAATTGTAC GAAGCAGAAA TCGCTCCAAA CTTGGAAGCT GGAAACGCAG TTGGATTTGC 2640 

CCATGGTTTC AACATCCACT TTGAATTTAT CAAAGTTCCT GCGGATGTAG ATGTCTTCAT 2700 

GTGTGCTCCT AAAGGACCAG GACACTTGGT ACGTCGTACT TACGAAGAAG GATTTGGTGT 2760 

TCCAGCTCTT TATGCAGTAT ACCAAGATGC AACAGGAAAT GCTAAAAACA TTGCTATGGA 2820 

CTGGTGTAAA GGTGTTGGAG CGGCTCGTGT AGGTCTTCTT GAAACAACTT ACAAAGAAGA 2880 

AACTGAAGAA GATTTGTTTG GTGAACAAGC TGTACTTTGT GGTGGTTTGA CTGCCCTTAT 2940 

CGAAGCAGGT TTCGAAGTCT TGACAGAAGC AGGTTACGCT CCAGAATTGG CTTACTTTGA 3000 

AGTTCTTCAC GAAATGAAAT TGATCGTTGA CTTGATCTAC GAAGGTGGAT TCAAGAAAAT 3060 

GCGTCAATCT ATTTCAAACA CTGCTGAATA CGGTGACTAT GTATCAGGTC CACGTGTAAT 3120 

CACTGAACAA GTTAAAGAAA ATATGAAGGC TGTCTTGGCA GACATCCAAA ATGGTAAATT 3180 

TGCAAATGAC TTTGTAAATG ACTATAAAGC TGGACGTCCA AAATTGACTG CTTACCGTGA 3240 

ACAAGCAGCT AACCTTGAAA TTGAAAAAGT TGGTGCAGAA TTGCGTAAAG CAATGCCATT 3300 

CGTTGGTAAA AACGACGATG ATGCATTCAA AATCTATAAC TAATTAGAAA TATATAGCGC 33 60 

TGGAGATGAT TTTATGAAAA AGATTATGAG AAAAATTGCA TCGTTATTAT TGGTTCTAGT 3420 
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TGTATAATGT AATTACACCG TCGGTAATAG TGCTAGCAGA CCAAAATAAA GCAGATTGGT 3480 

CGTATGATGA AAATGCTGTA ATTAACATTT ATGATGATGC TAATTTTGAA GATGGTAGGT 3540 

TGCATATGAA CTTTGAACAA TTCTTCAAAT TGGCACAAAT AGCTAGAGAA GAAGGTCTTG 3600 

AAATTCATTC TCCGTTTGAG AGAGCTGGTG CGACTAAATC TGCTCGTTAT ATAGCGAAAT 3660 

GGATTTTGAG AAATAAAAAA CATTAACAAA TATAGTTGGT AAATCATTAG GACCTAAATC 3720 

AGCTGTTAGA TTCGGAGAAG CTTTATCCTA TATTGAAGGT CCTCTTCGCA GAATAAATGA 3780 

GACGATAGAT GGCGGTTTAT ATCAAATAGA GCAAATTATT GCATCTGGAT TGAAAGAATC 3840 

GGGTTTAAAT GACTGGACTG CGAAAACTTT AGCTTCAGCT ATTCGTGGGA TATTAGATGT 3900 

ACTTATTTAG GGGTTGAAAT CATATGAATA TTACCAATTT GTTTTCTATC AAGACAGGAT 3960 

GTGATGAAAC TGATAGGCAA CTGCAAAAAC TATTTTTTCA GTTGGATTTA CAATTGGGAG 4020 

AATTGACAGA TCAACTAAGA AAATTAGATT CTAATTTTGT TCCTCGTAGT CAATTTGTAG 4080 

ACACGTTGGA TTTGAATGAT GTAGAATATA AAGAAATTTT AAACTATTTT ATCTTCCATC 4140 

GTAATGATAG TGAAGAAAGT TTGGTAGAAT GGTTATATGA TTGGATTTCC ACAAATCGTT 4200 

ATGAACTTCC TAAAGAGTTT TCGATTCGTA TGGCTCATAA ATACCATGAA AGTGTTACTG 4260 

AAGTTTTCGG AGATGAATAA CTAAAAAACA GTCATTAGTG ACTGTTTTTT ATAGAAAAAG 4320 

AGGTTTTATA TGTTAAGTTC AAAAGATATA ATCAAGGCTC ACAAGGTCTT GAACGGTGTG 4380 

GTTGTGAATA CTCCACTGGA TTACGATCAT TATTTATCGG AGAAGTATGG TGCTAAGATT 4440 

TATTTGAAAA AAGAAAATGC CCAGCGTGTT CGCTCCTTTA AAATTCGTGG TGCCTATTAT 4500 

GCCATTTCCC AGCTCAGCAA GGAAGAACGT GAACGTGGGG TAGTCTGCGC TTCTGCGGGA 4560 

AATCATGCGC AGGGAGTAGC CTATACTTGT AATGAAATGA AAATTCCTGC TACTATCTTT 4620 

ATGCCCATTA CTACGCCACA ACAAAAGATT GGTCAGGTTC GCTTTTTTGG TGGGGATTTT 4680 

GTAACTATTA AACTAGTTGG AGATACCTTT GATGCCTCAG CCAAAGCAGC TCAAGAATTT 4740 

ACAGTCTCTG AAAATCGTAC CTTTATTGAT CCTTTTGATG ATGCTCATGT TCAAGCAGGT 4800 

CAAGGAACAG TTGCTTATGA GATTTTAGAA GAAGCTCGAA AAGAATCGAT TGATTTTGAT 4860 

GCTGTCTTGG TTCCTGTTGG TGGTGGCGGT CTCATTGCCG GGGTTTCTAC CTATATCAAG 4920 

GAAACAAGTC CAGAGATTGA GGTTATCGGA GTAGAGGCGA ATGGAGCGCG TTCCATGAAA 4980 

GCTGCCTTTG AGGCTGGAGG TCCAGTAAAA CTCAAGGAAA TTGATAAATT TGCTGATGGG 5040 

ATTGCTGTGC AAAAGGTAGG TCAGTTGACC TATGAAGCAA CTCGTCAACA TATTAAAACT 5100 

TTGGTAGGTG TCGATGAGGG ATTGATTTCT GAAACCTTGA TTGACCTTTA CTCTAAGCAA 5160 

GGGATAGTCG CAGAACCTGC TGGAGCGGCT AGTATCGCCT CTTTAGAGGT TTTAGCTGAA 5220 
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TATATTAAGG GGAAAACCAT TTGTTGTATC ATTTCTGGAG GAAATAATGA TATCAACCGT 5280 

ATGCCAGAAA TGGAAGAGCG TGCCTTGATT TATGATGGTA TCAAACATTA CTTTGTGGTC 5340 

AATTTCCCAC AACGTCCAGG AGCTTTGCGT GAGTTTGTAA ATGATATCCT GGGGCCAAAT 5400 

GATGATATCA CACGTTTTGA GTATATCAAA CGAGCTAGCA AGGGAACAGG CCCAGTATTA 5460 

ATTGGGATCG CTTTAGCAGA TAAGCATGAT TATGCAGGTT TGATTCGTAG AATGGAAGGT 5520 

TTTGATCCAG CTTATATTAA CTTAAATGGT AATGAAACGC TTTATAATAT GCTTGTCTGA 5580 

GGACTAATAA AAAAATATCA TACCTTCATT TTGATTTCCT ATCTATTGAC AAGCATAGTC 5640 

ACACTGTCTT TAATACTCTT CGAAAATCTC TTCAAACCAC GTTAGCTCTA TCTGCAACCT 5700 

CAAAACAGTG TTTTGAGCAA CTTGCGGCTA GCTTCCTAGT TTGCTCTTTG ATTTTCATTG 5760 

AGTATAAGGT ATGATTTGAT TTCTTTTTGT TGACAAATAT ACTATATTAA AAAGATATAT 5820 

AAGTAATTAA CTGAGCTTAT CTGTCTTGTC ATCTCTATTA AGGATGGTTT AGATAATCGG 5880 

GTGTCTGCTT CTAGGCTAGC ACCTCAATAT CCAAAGGAGT GATGAATTTG AAGGACATAA 5940 

GGAATACCTA TCTCTCAGAT GATTTATTGA GGAAGAAAGA TAGGAGTTTT TGAGCTAGTG 6000 

AAGGCTTGGA TTTCTAAAGG TTAGAACTAT CATCTTCAGT TCTTAAATCG AAGAAATAAG 6060 

CTATCTTACG GAAATAGAGA AGCATTTTTT AAGAACTTGA ATAATTTCGC ACCTTAAGAG 6120 

GGTAATAATA CAGTATTTTT ATTAGCAAAT ATTTATGGTG TAGAGGCTAG CAAAACCTAT 6180 

ATATTATCGG ATTTAAAAAG GAAGTAAGAA A 6211 

(2) INFORMATION FOR SEQ ID NO: 9: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7939 base pairs 
<B> TYPE: nucleic acid 

(C) STRANDEDNBSS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

CCGGACTCCC CACGATTCTT CAAAATAACT GAGTATATTT CTATCTTGAT TTTCAGATAT 60 

AAATTCTTCC TTCTGTGGCC TCTTCTTACG CTTGAGAAGA GCTTCTCCGA CATGGCTTCT 120 

TCCTTACTGA GCAAAACCTT GAGCATAGAT AAGTTTGACT GGCAAGCGTG CTCTTGTATA 180 

TTTGGCTCCC TTCCCACTAT TGTGGATAGC GAGGCGTCTT CTCATATCAG TCGTATAGCC 240 

TATATAGTAG GATCCATCAC GACACTCCAG AACGTACATA TAAGCCTTAT GATCCATAAT 300 

AAATCTCTTC GATTTCGGGC GTATAAGAGC CATCATCATT GTGGACAATC AAAGGAGGTA 360 
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AGACCTTAAA GCCACTTGTT GAGCCATCCT TGATCGCCTC AATCAAAAGC ATATTGGCTT 420 

CCTTTTCTCT TTTTGGATAA ACAAACTGCA GGCGCTTAGG GGCTAGATTA TGTCGTTTTA 480 

ACGTATCCAA AATATCCAGA AGTCGATCAG GACGATGAAC CATGGCCAAA CGCCCATTAG 540 

ACTTGAGAAT ACTCTGGGCA CTACGACAGA TTTCTTCCAA ATTAGTCGTG ATTTCGTGTC 600 

GAGCCAAGAG ATAATGTTCA CTCTCGTTCA GATTAGAATA AGGATTCACC TTGAAATAGG 660 

GTGGATTACA CAAAATCATA TCCACCTTAC TCCCCTGAAT GTGAGCAGGC ATATTTTTCA 720 

AATCATCGCA GATGACCTGC ATTTGCTCCT CTAATCCATT CAAACGGACA GAGCGTTCAG 780 

CCATATCCGC CAAACGCTCC TGAATCTCAA CAGACAATAT CTGTGCTTGA GTACGAGTGC 840 

TAGCAAAAAG CCCCACTGCT CCATTCCCAG CACAGAAATC CACAATCAAC CCCTTCTTAG 900 

GAAAACGTGG AAATCGTGAT AAGAGAACAC TATCCACCGA ATAGCTAAAA ACCTCTCTAT 960 

TTTGAATGAT TTTGATATCT GTCGAAAAGA GCTGGTTAAT GCGCTCTCCT GATTTTAATA 1020 

ATTGTTCTTC TTCCATGGTC CTATTATAGC AAATTCATAT TAACATTACA AAAAATATAA 1080 

AACTCTAAAC TACTTCTTCT TTTTTAAATG GTGCAGGGCT TCTCCAGTCC AGATTGGTAG 1140 

CATTCGTCGA AAGGGAGCAA AGCCGTAGTT AAAGCGGTCG CTTGAAAAGC GTCTCCGTCT 1200 

AGGAAACTGG TACTTTTCTT CCTCCAAAGT GCGGATAGAA AGACTGGCTT TCCCTGTAAA 1260 

TTCATCTAAA TCCACTACCT GAACTTGAAC CTCTTCATCG ACTTTCAAGG TTTCATGAAT 1320 

ATTTTCAATA AATCCTGTCC GAATCTCTGA AATGTGAATC AGCCCCGTAT CACCCGTCTC 1380 

TAACTCAACA AAGGCACCGT AGGGCTGAAT CCCTGTAATA CGCCCCTTTA GCTTATCACC 1440 

GATTTTCATC TTAGTCCTCG ATTTCAATAG TTTCAATTAC AACATCTTCA ACTGGCTTGT 1500 

CCATAGCTCC TGTCTCAACA GCAGCAATGG CATCCAAGAC AGCGTAAGAT GCTTCATCAG IS 60 

CTAACTGACC AAAAACCGTG TGACGGCGGT CTAGGTGAGG TGTCCCACCT TGATTGGCAT 1620 

AGATTTCTGC AATCGGTTCT GGCCAACCAC CACGAGTAAT TTCTTTCTTA GAATAAGGTA 1680 

GGTGTTGGTT TTGCACGATA AAGAACTGGC TGCCGTTGGT ATTTGGACCA GCATTTGCCA 1740 

TGGAAAGAGC ACCACGGATA TTGTAAAGCT CTTCTGAGAA TTCATCCTCA AAAGATTCGC 1800 

CGTAGATTGA CTCGCCACCC ATACCAGTTC CAGTTGGGTC TCCACCTTGG ATCATAAAGT I860 

CCTTGATAAT ACGGTGG AAA ATGACACCAT CATAGTAGCC ATCTTTTGAA AGAGATACAA 1920 

AGTTAGCCAC TGTTTTAGGA GCATGTTCAG GGAAAAGCTT GATACGTAAG TCTCCGTGAT 1980 

TGGTCTTAAT AGTCGCAAGA GGACCTTCTA CTGTTTCAAT GTCTACTTGT GGAAAATGCA 2040 

ATTCTTTTTC TACCATACCA AATACTTCTA AGGCAGCAAA AATGCCATCT TCTTCTAATG 2100 

TTTTTGTAAT ATAATCTGCT TTTTCTTTGA TTTTATCATG AGAAATTCCC ATGGCAACGC 2160 
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TGATTCCAGC ATAATCAAAG AGTTCCAAGT CGTTGAGACC ATCTCCAAAA ACCATGACCT 2220 

TCTCTGGTTT CAAGCCAAGG TGTTCCACAA CCTTTTCCAC CCCCGTCGCT TTGGAGCCTG 2280 

AAATCGGCAC AATATCAGAC GAATGTTGAT GCCAACGAAC CATGCGAAGT TTGTCTGAGA 2340 

GACTGTCAGG CAAGTGCAAG TCATCTCCCT TATCTTCAAA AGTCCACATC TGATAGATAT 2400 

CTTCTTTTTC ATGGAAATCG GGATCTACAT CTAAGTCGGG ATAAATTGGA TTGATAGCTT 2460 

CACTCATCAT ATCGGTGCGA GTCGACAACT TGGCATCATG ACTCCCAACC AAGCCATACT 2520 

CAATTCCTTC TTGCTTAGCC CAAGAGATAT ACTCCTCAAC ATCTGACTTT TCAATCTGAT 2580 

GCTGATAAAT GACCTGACCT TTTTTATCTT CGATATAAGC CCCATTCAAA GTTACAAAAA 2640 

AGTCAGGCTT GAGATCACGA ATCTCTGGAA CAACACCAAA AATGCCACGT CCAGAGGCGA 2700 

TTCCTGTTAA AATTCCTTTT TCACGCAACT GTTTAAAAAC AGTGGGAATT GTAGTTGGAA 2760 

TAAACCCTGT CTTTGAATTC CGCAATGTAT CATCAATATC AAAAAAGACA ATCTTGATCT 2820 

TCTTTGCCTT GTATCTTAAT TTCGCGTCCA TCTCACTACC TCTTTCAATC TAACTCTTTC 2880 

CATTATATCA TAAAGTAGGC AAATCCCCTA TTTTCAAAAA GTTTATCATT TTTATTTTAA 2940 

TTTCTTGGAT GAGAAAAGAG ACATATTTAT GAAAAAGCTC CATCGTGCTT TTAATGTGTT 3000 

CTCTTGTTTT CAAACTCGTA AAAAGGGAGC CACTGATCCT AACTCGCTCT CTCATTTCAA 3060 

AGCTTGTGAA AAAAGACCCG TTGGGGTCTT AATTCGCTTT CTTGTTTTCA AGCTCATGAA 3120 

AAAGAGACCC AACTGGGTCT TTTCTTTAAT CTTCGTTTAC GAAAGGCATC AAAGCCATTA 3180 

CGCGAGCGCG TTTGATAGCT GTTGTTACTT TACGTTGGTT TTTAGCTGAA GTTCCTGTTA 3240 

CACGACGAGG AAGGATTTTC CCACGTTCTG AAACGAAACG GCTAAGAAGC TCAGTATCTT 3300 

TGTAATCAAC ATATTCAATT TTGTTTGCTG CGATGTAATC AACTTTTTTA CGGCGTTTGA 3360 

ATCCGCCACG ACGTTGTTGA GCCATGTTTT TTCTCCTTTA TAAGTTTAGT TGTCCATTAG 3420 

AATGGTAAAT CATCATCTGA AATATCCAAT GGGTTTGTTG CTCCAAATGG ATTTTCATTA 3480 

CGTGAAAAGT CTGGTACTGA ATTTGTAGGT GCTGAATAGT TTGCAGTTGG TGCAGAGTAA 3540 

GCTCCACCTG TGTGACCCTC ACGCACACTA CGGCTTTCCA ACATTTGGAA ATTCTCAGCC 3600 

ACGACCTCTG TCACGTAGAC ACGTTGTCCT TGCTGGTTAT CGTAACTACG AGTCTGGATA 3660 

CGACCTGTCA CCCCGATAAG TGAGCCTTTT TTAGCCCAGT TAGCAAGATT TTCAGCCTGT 3720 

TGGCGCCACA TAACGACATT GATAAAATCA GCCTCACGTT CACCATTTTG ACTCTTAAAT 3780 

GTACGGTTTA CTGCAAGAGT AAAAGTCGCA ACTGCTACAT TTGATGGGGT ATAACGCAAC 3840 

TCAGCGTCAC GTGTCATACG CCCTACAAGT ACAACATTGT TAATCATAGT TTACCTTCTT 3900 
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ACGCGTCAAT 


TTTGACGATC 


ATGTGACGAA 


GAATGTCAGC 


GTTGATTTTT 


GAAAGACGGT 


3960 


CAAACTCTTT 


AAGAGCTGCA 


TCGTCATTTG 


CTTCAACGTT 


AACGATGTGG 


TAAAGTCCTT 


4020 


CACGGAAATC 


TTGGATTTCG 


TATGCAAGAC 


GACGTTTTTC 


CCAAGTTTTT 


GATTCAACAA 


4080 


CAGTTGCACC 


GTTGTCAGTC AAAATAGAGT 


CAAAACGTGC 


TACCAAAGCG 


TTTTTAGCTT 


4140 


CTTCTTCAAT 


GTTTGGACGA ATGATATAAA 


GAATTTCGTA 


TTTAGCCATT 


GATATGTTCC 


4200 


TCCTTTTGGT 


CTAATGACCC 


CAAGACTTTG 


CAAGGGGTAA 


GTGAGGTTCG 


CTCACAATAA 


4260 


ACTATTATAC 


TAGAAAAAAT 


TTTTTTACGC 


AAGTAAAAAC 


ACTAGAATTC 


GAAAAAACGC 


4320 


CACATGGGCG 


TTTTCCTGTT 


CTTATGGTTT 


GATACGGTGC 


AACATACGTG 


GGAATGGAAT 


4380 


AGCTTCACGG 


ATATGTTTTG 


TTCCTGCTGC 


GAAGGTTACC 


ATACGTTCGA 


TACCGATACC 


4440 


AAATCCTCCG 


TGTGGAACTG 


TACCGTATTT 


ACGAAGGTCA 


AGGTAGAATT 


CATATTCTGT 


4500 


ACGATCCATG 


CCAAGTTCAT 


CCATCTTAGC 


GACAAGGGCA 


TCGTAATCTT 


CCTCACGCAT 


4560 


AGACCCACCG 


ATAATTTCTC 


CATAGCCTTC 


TGGAGCAAGC 


AAGTCTGCAC 


AAAGCACGCG 


4620 


CTCTGGATTT 


CCAGGAACTG 


GTTTCATGTA 


GAAGGCCTTG 


ATGGCTGCTG 


GATAGTTCAT 


4680 


GACAAATGTT 


GGCACACCAA 


AGTGGTTTGA 


AATCCAAGTT 


TCGTGTGGTG 


ACCCAAAGTC 


4740 


ATCACCATGC 


TCAAGATGCT 


CGTAGTCAGC 


ATCTTCATCA 


TTTTCATGCT 


CTTGCAAGAG 


4800 


GTCAATGGCT 


TGATCGTAAG 


TGATACGTTT 


GAATGGCTCT 


GCAATGTAGC 


GTTTCAAGAG 


4860 


TTCTGTATCA 


CGTTCCAAGG 


TTTCCAAGGC 


TTGAGGCGCG 


CGGTCAAGAA 


CACCTTGTAG 


4920 


AAGAGCTTTC 


ACATAAGCTT 


CTTGCAAGTC 


AAGCGACTCA 


TCATGTGTCA 


AGTATGAGTA 


4980 


CTCAGCATCC 


ATCATCCAGA 


ACTCAGTCAA 


GTGACGGCGT 


GTTTTTGATT 


TTTCAGCACG 


5040 


GAAAACTGGA 


CCAAAGTCAA 


AGACACGACC 


AAGAGCCATA 


GCCCCTGCTT 


CTAGGTAAAG 


5100 


CTGACCTGAT 


TGGCTCAAGT 


AGGCTGGCGT 


TCCGAAGTAG 


TCAGTTTCAA 


AGAGTTCTGT 


5160 


AGAATCTTCT 


GCCGCATTTC 


CTGAAAGAAT 


TGGGCTGTCA 


AACTTCATAA 






GTCAAAGAAC 


TCATAAGTTG 


CATAGATAAT 


AGCGTTACGG 


ATTTGCAACA 


CAGCTACTTG 


5280 


CTTACGAGAG 


CGTAgCCACA AGTGACGGTT ATCCATCAAA AAGTCTGTTC 


cgtgttctit 


5340 


TGGTGTGATT 


GGGTAGTCTT 


GAGATTCACC 


GATCACTTCG 


ATGTCTGTGA 


TGTCCAACTC 


5400 


ATAGCCAAAT 


TTAGAACGTT 


CGTCCTCTTT 


GACAATACCT 


GTCACATAAA 


CAGACGTTTC 


5460 


TTGGCTCAAG 


CGTTTGATAA 


CATCAAACTT 


CTCAAGTCCC 


ACTTCTTCAC 


CAAATTTTTC 


5520 


GACAAAGTTT 


GGTTTAAAAG 


CCACACCTTG 


AAAGAAGGCT 


GTTCCATCAC 


GCAATTGTAA 


5580 


GAAAGCGATT 


TTTCCTTTTC 


CTGATTTGTT 


GGCAACCCAA 


GCGCCAATCG 


TCACTTCCTG 


5640 


ACCAACATAG 


TCTTTTACGT 


CAATAATCGT 


TACACGTTTT 


GTCATTATTT 


TTCCTTTTCT 


5700 
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TTTTTATTCT TTATGGCAAA CCACCTCTAT ATTGTTCCCA TCCAGGTCAA TCATAAAAGC 5760 

AGCATAGTAA ATCGGATGCT CACTTCGATA ACCAGGAGCC CCATTGTCTC GCCCACCTGC 5820 

CTCTAAGCCA GCCTCATAAC AAGCCTGAAC TTCTTCCTTA TTTTCTGCTA AAAAAGCAAA 5880 

ATGAACAGGA TCTTGTGTTC CCTGAGTCAG CCAAAAATCA CCACCAGGAT GAGGGCTGTT 5940 

CGGGGATAGA AAACTAATTA GAGAACTAGT CTTAAAAGCC AATTTATAGT CCAAAGGAGC 6000 

GAGAAAACTC CTATAAAATC CTTATGAAAT TTGTAAATCC TTTACCTTAA TCTCAAAATG 6060 

ATCAATCATT CTCACTACCC ATAAATGCTT TCAAGCGTTC GACTGCTTCT TTAAGCGTGT 6120 

CTAGGTCTGT CGCATAGCTG AGGCGGACAT TTTCTGGTGC TCCAAATCCA GCTCCTGTTA 6180 

CCAAGGCCAC TTCGGCTTCT TCTAAGATAA CAGTTGTAAA GTCTGTCACA TCCGTGTAGC 6240 

CTTTCATCTC CATGGCCTTT TTGACATTTG GGAAGAGATA GAAGGCCCCT TGCGGTTTGA 6300 

CCACTTCAAA TCCTGGTACC TCTGCAAGGA GGGGATAGAT GGTATTAAGA CGTTCCTCAA 6360 

AGGCCTGACG CATGCTTTCT ACAGTATCTT GCTCACCTGA TAGAGCCTCA ACTGCTGCAT 6420 

ATTGGGCTAC TGCTGACGGA TTCGAAGTTG TTTGACCTGC AATCTTGGAC ATGGCAGCGA 6480 

TAATGTCTGC TTCTCCAACG GCATAACCAA TCCGCCAACC AGTCATGGCA TAAGTTTTAG 6540 

ACACACCATT GATGACCACT GTTTGCTTGC GAATCGCTTC CGATAGGCTA GAAATCGGTG 6600 

TGAACTCATG ACCATTATAA ACCAAGCGGC CATAGATATC GTCTGCTAGG ATGAGAATAT 6660 

CATTTTCTAC AGCCCAGTTT CCAATTGCCA AGAGTTCCTC ACGGGTGTAA ATCATACCTG 6720 

TGGGATTAGA TGGCGAATTC AGCACCAAAA CCTTGGTCTT GTCAGTGCGA GCTGCTTCTA 6780 

ACTGCTCTAC GGTCACCTTA AAGTGATTGT CTTCCTTAGC AGAAACAAAG ACGGGAACGC 6840 

CTTCTGCCAT CTTGACCTGA TCTCCATAGC TAACCCAGTA TGGGGTTGGG ATGATGACTT 6900 

CATCACCTGG ATTGACCACA GCCATAAAGA AGGTATAGAG AGAATATTTG GCTCCCGCAG 6960 

CGACTGTCAC TTGATTTGAC GCTACAGAAT AGCCGTAAAA GCGCTCAAAG TAGCTATTGA 7020 

CGGCCGCCTT AAGCTCTGGC AGACCTGAGG TTACTGTATA AAAAGAAGCA CGCCCATCTC 7080 

GAATCGATGC AATGGCGGCA TCTTGGATAT TTTTGGGAGT AGTGAAATCT GGCTCACCCA 7140 

AGGTTAGAGA CAAAATATCT CTACCCTCAG CCTTCAGTGC TTTGGCACGG GCTCCAGCAG 7200 

CCAAAGTCAC ACTTTCTTCC ATTTCTAAAA CACGGTTGGA TAGTTTCATA GGCCCTCCTT 7260 

GTTGACCAAT GCTCCTGTTT CAAAATCTAC TAGATAAAAA TCAGATCCTG ACTTAACTTC 7320 

CCAGATTGGC TTATCTTGAT AACGGCCAAA GGTTATCTTG TCAATCTCGC CAGCTCCCTT 7380 

TTCCTTAGAA ACCGTTTCTG CTTTTTCTTG TGAAACACCC TGATTTAGCT GATAAACGTA 7440 
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AATCTTATGG TCATCTTTAC CAATCAGGAC AGCAAGCGCT TCTTGCTGTT TGTTACGACC 



7500 



AAGAACGCTG TAATAAGATT CCAAGCCATT GTATAAATCA ACCTGATCAG CCTGCTCTAA 



7560 



TCCTGCATAC TGCTGAGCTA ATTTTTCTCC TTCACTTTTA GCTGTTTGAT AGGGTTTCAT 



7620 



GCTAAGAGAA ACCATATACA GAAAGGAACC ACTGATAACC ACAAACAAAA TCGTCATCCC 



7680 



TAGACCATAC TGCCACAGTA GATTATTTTT TGCTTTGTTT TGTCTTTTTT TCACTCGTCT 



7740 



ATTTTACCAT CTATTAAGCT TTATTACAAG TGAATATAAG AATACTCTTC GAAAATCTCT 



7800 



TCAAACCACG TCAGCTTTAT CTGCAGACCT CAAAGCTGTG CTTTGAGCAA CCAATTCTAT 



7860 



TTCTCCCTTC AAACAAAACC GATTTTGAAA GTGAAACAGT TCTTACTTTT TCAGTCACAA 



7920 



ATGATTAGAG TTTGCCGGG 



7939 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9897 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CCGCTCTACC GTCAAATAAT TACCATTTTG TTTAATACCG AAATTTTTAT CTACTGAAAA 60 

TTCAGTTGGT CTGTTGGTAC GATCGTCGTA TACAGTACCA TTCTCACGAA TAGTATAATT 120 

GTAATCAGTA TCACCTTGTT TCCTTAATTT AAGGTAATAA TTACCATCAA TTTGTTTATA 180 

ACCTGAATCT TTTCTAGTTG CTTCTCTAAA ACTTACTCCA GCAGGCATCA CATCAGCAAA 240 

CATGAGTACT TGTTTGTTCT TTTTTTCAAC AATAACAGAG TCAATATAGG TTGCACCACC 300 

GCTGATTTGT AAGTCACGTC CACCAACTTC ACGAGGCCAT TCTAATGGTA CTGGCGCAAA 360 

ATCATCGAAT GCCAATGTTA ATTTTGGTTT AGTCCATGTC TTACCATTAT CATCACTATA 420 

ACTTGTAGCA ATATTAATTT TATTCAAGAA ATCATGAGTT CCACCGTAAC GAGCGTCAAT 480 

GCTTGAAAAT ACCCGACCAT TGCTAAAAGT ATACAGAACT GGAATACGGA AATAGTTAGA 540 

ACCTGTTGTA TCATTAGCCG TATAAATTAA ATGTCCAGTA ACAGCGTTTG TTGTCATCTT 600 

TTTAACAGTT TCTTCATCCA ATGCACTATT' AAAGAATTTG ATATTTTCTA GTGTTCCGTT 660 

AAAACCAAAC GCCGTTTTTC CTGCACGTTT CACTCCCCCA AGCATATAGT AATCAATACC 720 

TTTAATATCC TTGATGTTTA GGAAATTATC CACTTTCTTT TCTACTACTT TTGTACCATT 780 

TGCGTATAAA GAATATGTTT TTTTGACTGA ATCTGCTACT ACTGCAACAG TGTTAGTCAC 840 

AGCCTCTTGT TTGTACTTAC CCCAAACTGA AGCAGGTCTG GATACTAGGT TATTTTTATT 900 
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GGAAGAAGTA TCACGCGCTT CCATCCCCAA CTCACCATTG TCTCTAAGGA ACACATCTAC 960 

ATAACTATTT TGTTGACCGG GTTTGGAATT AGATATTCCA AACAGAGCTT GTAAGCCTTT 1020 

CTCACTTGAC TGATTGTACT TAATCACTAC AGTAAAGTCA CCGCTAGTAA ATTTATCCTT 1080 

TAACTCTTTA GTAACATTTT CTCCGCCCCC TGTTAAAGTA ACATTATTTT TTTCTAAGAC 1140 

AGGAGTTTCT TCCGCTGTAG AAGATGGATC CTTAACAGTA GTTTCAACTG TTCGAGGTTG 1200 

TACAGTAACT TCCGAAGAGT TATCCGATGT AGGTTGTACT TCCGAAATCG GAGTCGTTGG 1260 

TGCAACAGGT TGCACCAACT TTGGTGTTGA TACTTCAGAA GTTTCAGTCT CCTGAGCTGC 1320 

AACTGAGTTA GCAACAAATG CTGATAATAC CACTACAGTA CCTAAGGTTA CATATTGTTT 1380 

AATATTTTTT TTCATTTTAT TTTTCCTCGT TTAAAACTTT GATAACAAGT TTTTTAACAG 1440 

TTTCATCATT GCAATGAATC TTTGGTTGGT GAAGATCTTC TTCAAAAGTC ACCAACATAT 1500 

TCCCTGGAAG CAATTCAACA ATTTGATAGT CTTTGCTATC GTAAAAAGCA ATATCCTTCT 1560 

CTTCGCTAAA AGGTACACGT GACTGGGCAC GAACTGGGGA AGTTACTGCC ATTTTTTCAG 1620 

TATTTTCAAC AACAATATGA ATATCTAAAT ATTTCTTATG AGTTTCAAAA ATATCTCCTG 1680 

GAACTCCATC AGCTAGATAA GTCATACAAT TTGCAAAAAC ATTTTCCCCG TCAATATCAA 1740 

TTTTTCCATC AACTAAATCT GTCAAATTTG TATTTTCTAA AAAATCACAG ACTTTTGAAA 1800 

AATATTTATT GACAGAAGCA TATCGTTTAA AATCAGATTG TTCAGAAATA ATCATATTAT 1860 

TTTCTCTTTT CTATTAGTGA CGAACTTCCC AACTTGAATC CGCTTTAATT TCTGTAATAT 1920 

CATGAATCGT TGTATATTTA GGTGCAGATA CTTTATTTCC AGTAAGAACA GATACAATAT 1980 

AACCTGAAAC TACTGATACA GAGATTGAAA TCAATGAATA TGCCCAGTAG CTAACAGCTG 2040 

TTGGAGGAAG GAAGTATTTA ATAAATACCA TGACGATGGT TGATACAATC AGCGCTGCAT 2100 

AAGCACCTTG TTTATTTGCT TTTTTAGAAA CAAATCCAAG AATAAATACA CCACCAAGTA 2160 

GACCAAGTAC AAGTCCCATG AAACTATTGA ACCATTCGTA TGCAGATTTA ATATCTGAGT 2220 

GAGCCATGAC AATGGAAACA CCAATTGAGA ATAAACCTAC TGCTAGAGAT ACGAATTGTG 2280 

CAATTTTCGT ACGACGATTG TCTGACATAT TTTTAGAAAT GACATCTTGA ATATCCAATG 2340 

TCCATGAAGT TGCAACAGAG TTCAAACCTG TTGAAATAGT TGATTGAGAT GCTGCATAAA 2400 

TCGCTGCCAA GATCAAACCT GTGATACCTA CTGGTAACTG GTATGCAATA AAGTACATAA 2460 

AGATTTGGTC TTGAGGGATA TTGCTAGCTG CACTATCTGC ATTTTGTACT TGATAGAATA 2520 

CGTACAAGCC TGTACCAATC AAGTAAAAGA CTGTTGCAGT TGCAAGTGAC AAAACACCGT 2580 

TTGTGAACAA CATCTTATTA AGTTTCTTAA TATTTTGTGT TGTAGTAAAA CGTTGAACCA 2640 
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AATCTTGAGA TGAAGCATAG GAAGACAAGA TTGTAAAGCC TGAACCCATC ACAATTAAAA 2700 

AGATGGAGTT TGAAAGCAAG TTAGGATCGA AAAGTTTTTC ATTTGCAGCA AGGAATTTCC 2760 

CGTTTGCTAA TGTTTCTGCT ACTGCACCAA AGCCACCTTT AATATTAGCA ATCAGTACAA 2820 

ATAAAGCTAA AACGACACCA CTAATCAGAA TCACACCTTG AATAAAGTCT GTCCATAATA 2880 

CGGATTTTAG ACCACCAGTA TAAGAATAAA CAATTGCAAC TACACCCATC AAAATAATCA 2940 

AAATATTGAT GTCAATTCCT GTCAATACTG ATAAACCAGC TGATGGGAGG TACATAATGA 3000 

TAGACATACG TCCCAATTGA TAAATAATAA ACAAGAGTGC TGAAATAATA CGAAGTGCTT 3060 

TAGAATTAAA ACGTTTATCC AAGTAATCAT ATGCCGTATC GATGTCTATC CGTGCAAAGA 3120 

TAGGTAAGAT AAAACGAATT GTCAGTGGAA TAGCTACTAC CATCCCTAAT TGAGCAAACC 3180 

ATAAAATCCA. GCTACCTGCA TAAGAGCTAC CAGCGAGTCC CAAGAAGGAA ATCGGACTGA 3240 

GCATTGTGGC AAAAATGGAT ACCGAAGTAA CATACCAAGG AACCGAACCA TCTCCTTTAA 3300 

AGAACTCTTT TCCTTTCATC TCTTTTTTAG AGAAATAGAT ACCTGCAACC AACACCGCAA 3360 

GTAAATAAAC AATCAAGATA ATTAAGTCAA TTATTGTAAA TCCTGTTGTG CCCATAACAT 3420 

ATCTCCATAT TGATTTTATT TATTATAAAA ATTCTTTTCG TGCTTGTTGA ATAAGTTCTG 3480 

CTGCTTGTTT TGCAACTTCC AAGTCACCTT CTGCCAATGC TTCTAAAGGT TGACGAACAG 3540 

AACCTAAATC AAGTTTTTCA TTTAGACGCA AAACTTCTTT TGCTACAGCA TACATATTTG 3600 
CCTTACCTGA TATCATCTTA TAGATAACTT CATTGATAGC ATATTGAAGT TTTTTAGCTG m 3660 

TATCTAAATC TCGTTCTTGA ATCAAACTTT CCAATTTCAA GAACAAATCT GGCATAACGC 3720 

CATAAGTACC ACCAATACCA GCTTCTGCTC CCATCAAGCG ACCACCAAGA TATTGTTCAT 3780 

CTGGACCATT GAATACAATG TAATCTTCTC CACCTGCAGC TACAAACATT TGAATATCTT 3840 

GTACAGGCAT AGAAGAATTT TTAACTCCAA TCACACGAGG ATTTTGACGC ATTGTTGCAT 3900 

ACAAACTACC AGTCAACGCA ACCCCTGCCA ATTGTGGAAT ATTATAGATA ATAAAATCTG 3960 

TATTTGACGC AGCTTCACTC ATTGCATTCC AATATGCTGC GATTGAATAC TCTGGCAATT 4020 

TGAAATAAAT AGGTGGGATA GCTGCAATAG CATCGACTCC AACACTTTCT GAATGTTTTG 4080 

CCAATTCGAT ACTATCTTTC GTGTTATTAC ATGCAATATG GTTGATAACT GTTAATTTAC 4140 

CTTTAGCAAC TTCCATAACA GCTTCAATAA TTTGTTTACG ATCTTCTACA CTTTGGTAAA 4200 

TACATTCACC TGAAGAACCA TTTACATAGA TACCTTTTAC ACCTTTGTCA ATGAAATATT 4260 

GTACCAGAGA TTTTACACGA TCTTGGCTAA TTTCACCATT TTCATCATAG CAAGCATAAA 4320 

ATGCAGGGAT AACGCCTTTG TATTTAGTTA AATCTTTCAT CAGATTTCTC CTTTATATTG 4380 

TTTTTTATTT GATGACATTA ATAAATCGCT GAGCAATTTC TTTTGGACGT GTAATCGCTC 4440 
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CACCAATGAC TACACTGGTA ACACCTAAAC TATAAGCTTT TTTTAATTGT TCTGGATAAT 4500 

GAATTTTTCt TCGGCAATTA CCGGAATATT AAAATCAGCC AATTTTTTCA TTAGTTCAAA 4560 

ATCAGGCTCA TCTGATTGTA CACTTGTACT TGTGTAACCT GATAATGTTG TACCAACAAA 4620 

ATCAACGCCT GATTTAAATG CATAGAGACC TTCATCTAAA TTACTTACAT CCGCCATCAG 4680 

CAATTGATTC GGATATTTTT CTTTTATTTT TTTGATAAAT TCACTGACAA CTAAGCCATC 4740 

ATATCTTGGT CTTAAAGTTG CATCAAATGC AATGACTGTT GTTCCGCATT CTACAAGTTC 4800 

ATCTACTTCT TTCATCGTAG CAGTAATATA TGGTTCTTGA GGTGGATAAT CCCTTTTGAT 4860 

AATTCCAATT ATTGGTAAAT CTACTACTTT CTGAATTGCT TTAATATCAC GCACAGAATT 4920 

TGCGCGAATG CCCACTGCTC CTGCCTCTAA AGCTGCTTTA GCCATAAAAG GCATCAAGCT 4980 

AAATTCTTCA TTATAAAGGG CTTCACCAGG TAAAGCTTGA CAAGAAACAA TGACTCCACC 5040 

TTGAACTTGG CTTATAAATT TTTCTTTAGT CCAAATTTGG CTCATTTTAT TATTCCTCCT 5100 

TATGGATAAT AGTTTGATTG TAATAATATT GTCTCTCTGG ACTTTCCAGA TAATTAGAGA 5160 

ATAAGCAGTC TGTAATTAAA AGTATTGGAA ACTGAGGTGA TATGCGATTG CCATACGAGA 5220 

GATGATCGGT CGAAGCTAAT AACAATAGTT CATCAAAGAA ACAATCTTCT TCGTCAAATT 5280 

TTCTTGTAGT CATTAAAACT GTTTTAGCGC CTTTATCTGC AGCTTTTTGT AGACCTTCTA 5340 

GTACAATATC AGTTTGACCT GAAATGGATG CTCCAATGAC AAGGCAATTT TCATTAAGTA 5400 

GTAAGCTACT CCACAAAATC ATATCCTCGT CTGATAATAC TTCACCAATC ACTCCGAGAC 5460 

GCATAAATCT CATCTTCATT TCTTGTAAAG CAAGAACAGA ACTTCCTTTA CCGTAGAGAT 5520 

ATACACGCTC AGCAGTTTCT ATCATCTCAG CAATACGCTC AAGTTGAACT TCATCAAGAA 5580 

CCGTGTAAGT TTTTCTCAAC ATTTCCTCAT AGTCGGATAA AACTTTTTCT GTTGCCTCTG 5640 

TATATAATGC CAACTTTTCT TTCTCATGAA TCATCTCTTG GTATTTGAAA ATGAATTGTC 5700 

TAAAACCTTT AAAACCACAT TTTTTCGCAA ATCGAGTCAA TGTTGCTTTG GATACATTAA 5760 

GGTATTCGCA CAATGCTTTA GATGAATAAT CATTCAGAGG TTGCTGTTTT AAGAAGAATT 5820 

TAGCAATGTC TTTTTCAGCA TATGCCATAT TTGGTAAGTT AGCTTCTATC ATTGGAATTA 5880 

GTTCTTTTTG CAGTAACATA TGAGCTCCTT AGTTGAAGTA AACGTTTACA TTCTTTATTT 5940 

TAACACTTTT TTTTTTTTTC AATATTTTTC ATAAATTAGA AACTAGTTTC CAATTTCTTT 6000 

CGTTTCATAA CAGAACAACA AACATAAAAA TATAATAGTT TTTATTCTTT TTATCGTAAT 6060 

TATATGTATT GTAAGAACGT TTATCACTAA TAATATGTTC ATATTAAAAT ATTTTAGTAA 6120 

TATTTTATTT TGGTTTTATT ATTTCTTTTC GGAATTTCTA TATAATATTT TATTTCTAAA 6180 
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AAAATTGAAA AAATATTTCT AGTTTCTTTA TTTTATATAG GTAATATATT TTATTTCTAA 6240 

ATTAAAAGAG AATCCCATAA AAACTACAGA TTTATGAGAT AAATCAGGTC ACCTATTTTA 6300 

AAAAAGCAGC AAACTATAAA CTAAAAAGTT CCACACCAAA TGTAACCCCA TACTTCCCCA 6360 

TAAGTCAGAT TTATAGCGCA CCATACCTAA AAACATTCCA AGTGAAACGT ACAGACACCA 6420 

AGCTAGAATG GTTCCTGGAT GATGTACTAA GGCAAATAAA ACACTTGTCA AAGCAACTCG 6480 

AATATCTAAT TTTCTAACCA AGTTCCATAA AATTTCACGA TACAGAAATT CTTCAACCAT 6540 

ACTCGCATTG ATTAAGAACA ATAAAAATGA AAACCAAGGA ACTTGATGTT GAAGGCCAAT 6600 

TAAATTTGTT TGATTCGTGC TTCCTTGAGC ATGAATCAGG CTAAAACATA GACTTATAAT 6660 

CAGTAGACTA GCTAGTCCAA TACCAAGGCA TTTCATCCTA GTTTTCATAT TGACCTTGAC 6720 

CACTTGTTTT CGTTGACCAT ACATCCATAA AAAAGAAAAA AGAGACGCAC CATAGAGAAC 6780 

CTGTAGTATA GTTAACTCAC CGATACAAAG AAATTTCAAT AAGTATAGAG ATACCAATAG 6840 

GACATTTACT TGTTGGAATA TATAAACTGG AATTATTCTT TTCATAGTTA CCTCCGAAAT 6900 

AAATCTTCAT AATCTAAATC TAATATCTGC ACAATCCTTT CTACCCATGG ACTTTGAGGC 6960 

ATTCGTTGTT CCATCTTGTA GTGGCGAATC TTTTGATATA AACGATTCAA TTCACTTGGA 7020 

TAGTGAAACT CTCCCGCAAA CATTTTTCTG GTTAACTCAA TCCAGCTGAT ATTTCTTTCA 7080 

GCCAAAATAA TGGACAAGTT CTCCCAAAAT CGTTCAGCCA TATTrCTTCT CCTTTAGTTA 7140 

GATAAATAAT GTGTTTGyGC CATGTAAATC AATTGTTTCG TATCTCTTGG CAATAGAGCT 7200 

CTAGCCTCTT CCAAATTCAG ACTTGGATAA ACCCGCTTAT TTGAAACCAC AAAAGGAAGT 7260 

CCGATGGTTA GTTCAGGATT TTTTAAAATT ATCTCAACGA AATCCGTTAA TCTTAGATTG 7320 

TCACGGTTCT TAAATCGTAA TAAATTGGGA GATAAAAACT CAAAACAATC TGAAGAATAG 7380 

CTCATCATCT CAATTAATTT GTCCTTTGTC ATTTCAGAAA CTGAATGACA AGATACCTCA 7440 

ATGCCATAGT TTTGGAAGAA GTCTAAAAGA AGTTGATTTC TTTGGCTATT TTTACTTAGA 7500 

TAGAGATCAA TCATGGGAGA CCTCCAACAA ATTTGCTTCC ATTTGATATT CTGAGACGAT 7560 

TAAGGAATCT AACAACTTTG AGAAGTTAAT CGATTTCTTG TCTTCATCAT AAGCTTTTAC 7620 

AGTTACTTGG GTTGTAAGTA TCCCCTCTTT TCCCTCGGCT CGATAGTCTT GTCAATATAA 7680 

AACAAAAACA AGATTCTGAT TATCATCTAC AAAGGCATTA ACTCCGTTCT TTATATCCTG 7740 

ACTTTCAAGG AATTCCATAA CGTTTTGAAG ATAGGATTCA TAAAATAGTG GGTAATTATG 7800 

TTTTTTATGG TAATCATCTA AAAATGTTAC CTCAAACTCA CATGGATAAT TGGGCATCAA 7860 

j 

AAATATTTGT TCATCCAGCT GTTTGATTTC TGCATCATGT AATTCTGTTT CTAATTCATC 7920 

ACAATCTAGT ATTGATTCTT TATTTAATGC TTTTATCTTT TTCCTCTATT TCTTTTAATT 7980 
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TCTTTGCGAT TGCGGCAATC ACAGGAACGG TTACACTATT ACCAACTTGT TTATAGAGCT 8040 

GACTATTAAT AGAGACTTTT CTAGCAGCTT CAAAAGCCTA ATCAGGAAAG CCATGCAATC 8100 

GAAAACACTC TTTAGGAGTG ATTCGTCGTA TTCTCAAACG GTAAAATTGT CCATCTATTA 8160 

AAACACCAGC TACTTGGTAA ACTTGTTTAT CTTCTCCTTC ATAGCTAGCC ACTACTACTC 8220 

CCATTTGACC ACTAGTTGTT AACGTATTAG CTATACCTTT TCCAACTCTA CCACGACGAT 8280 

ACTGAGAACT TGGTCTTTCT AAATTGATTG AATCCCCAAT CTCTGCTTGA GCATATCCTT 8340 

TTTTCGTTGC TTCCCGTACT TTTAGAAATT GGATTGGTTC TGGAATTAGT ATTTTGGGGA 8400 

TTTTATCTCC TCCTTGCATC GTAGTCAGTG TTGGAGATAA GCCCTCACTT CCATAGACAC 8460 

GACCTGTCTC CTTAAAGCTA GTCGGTAAAT CTCCAACAAC GACAATGCCA TAACGATCCT 8520 

GAGTATTTAA AGTAAACATC GGCTCTTGAT TTTCCTTAAA GCGTCTCCCA TTTTGTCTCT 8580 

TGTCTAATCT ATCTGGTGTC ATACAAGGAA TCGCAACTTT AAATCCTTCT CCTTTACCAC 8640 

GAACTAAGGT TGGCGCAAGA CCTTCTGAAT AATAGACTTT ACCGCTCATT CCACTTCTTG 8700 

ATGGATTCAA ATTTCCTAGT GCTTTCAAAG TCTCAGAGTT AGTTGCTTGA CCTTCTCGTC 8760 

TGAAAGGAAA TAAGAGTCTG GTACCTTTCT TTCTAGAATG TCCGATAATA AAGACCCTCT 8820 

CTCTGTTTTT GGGAACGCCA AAATCCTTAC TGTTAAGCAC CTGCCACTCA ACATCAAACC 8880 

CCAACTCATC AAGTGTGGTA AGTATTGTGG TGAACGTCCG TCCCTTATCG TGATTGAGTA 8940 

GGCCTTTAAC ATTTTCAAGA AAAAGAAAAC GTGGTTGGAT TTGTTTGGCC GCCCGAGCAA 9000 

TTTCAAAGAA CAAAGTTCCT CTAGTATCTT CAAATCCCAA TCGTCTTCCT GCGATTGAAA 9060 

ATGCTTGACA AGGGAATCCC CCACAGATGA CATCGACTTT CCCTCTAAGT TTTTTAAATT 9120 

CGTCATCTGA AACATCTCGT ATGTCATGAA ATTCTATTTC TCCTTCCGTT TGAAAAATGG 9180 

ACTTATAAGA TTTCCTAGCA AATTTATCAA TCTCACAAAA TCCCAAGCAC TCATGCCCTT 9240 

GAGCTTCCAT TCCCATCCTA AAGCCTCCTA TCCCAGCAAA TAAATCTAAA ACCCAAATCA 9300 

TTCATACCTC TCTCAACTAG ATGTAACTTA CAAAACCCCT GACCTCATGA GCCACTTTCT 9360 

TCCTCCTCAT GAGGTCAGTT TTACTTTCTG CTGTTCCAGT ATCGTTTTTC CTCGCTAGAT 9420 

TTCCTCAAAA GGGCAGACTC CTCCCTTGGT TCGTCACACG ATTTTTTCAT CTCGACTGTT 9480 

CTTTAATGCA TCATTAACGA CGCTTTTCTT CTAGGTGGTT CATAAGGAAC AGGAAGATTC 9540 

AGGTTGACTT TTCTAATCCT AGAATAAAGT GCTGAAAACA ATTCGGAATA GGCATAGAGA 9600 

CTAGACAATT TGAGGAGCTG CTTGCGTCCT GTTCGAACAC ATTTTCCTAC CACGTGAAGA 9660 

AAAAGATGGC GGAAGCGTTT GATTGTTAAA GTTTGGAAGT CACCTCCAGC TAGATGTTTG 9720 
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AGAAAAAGAT AGAGATTGTA GGCGATACAG CTCATCATCA TACGAACTCG TTTTTGATTA 9780 

AGGTTGAACT ATCCGTTTTA TCGCCAAAAA ATCCCTCCTT CATCTCCTTG ATGAAATTCT 9840 

CGGCTTGACC ACGTCCACGA TAAAGCTGAA ACTGGTCTTG GCTTGTTCCG GTACCGA 9897 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 8148 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: ,11: 

CCGTGGAACA AGCCAAGACC AGTTTCAGCT TTATCGTGGA CGTGGTCAAG CCGAGAATTT 60 

CATCAAGGAG ATGAAGGAGG GATTTTTTGG CGATAAAACG GATAGTTCAA CCTTAATCAA 120 

AAACGAAGTT CGTATGATGA TGAGCTGTAT CGCCTACAAT CTCTATCTTT TTCTCAAACA 180 

TCTAGCTGGA GGTGACTTCC AAACTTTAAC AATCAAACGC TTCCGCCATC TTTTTCTTCA 240 

CGTGGTAGGA AAATGTGTTC GAACAGGACG CAAGCAGCTC CTCAAATTGT CTAGTCTCTA 300 

TGCCTATTCC GAATTGTTTT CAGCACTTTA TTCTAGGATT AGAAAAGTCA ACCTGAATCT 360 

TCCTGTTCCT TATGAACCAC CTAGAAGAAA AGCGTCGTTA ATGATGCATT AAAGAACAGT 420 

CGAGATGAAA AAATCGTGTG ACGAACCAAG GGAGGAGTCT GCCCTTTTGA GGAAATCTAG 480 

CGAGGAAAAA CGATACTGGA ACAGCAGAAA GTAAAACTGA CCTCATGAGG AGGAAGAAAG 540 

TGGCTCATGA GGTCAGGGGT TTTGTAAGTT ACATCTAGTT GAGAGAGGTA TGAATGATTT 600 

GGGTAAATAC AATGAGCTTG AAAGAAGTAG CAAACTCACC AAGCGCCAAT TCTTTGAGAA 660 

TCAGATGCTG GATTATACCA TCATTGCGCA TGAGAGTTTT GAAATCATCC GTCATTCTGT 720 

CTACCAGACA GATGATCGTG AAGTGGAAAA TGCTCTGGCT TTTGAAGTGA AAAATGATGA 780 

AACAGACAAG CTGATTCTGT TATTAAGCGA GGATATTGGT GTAGGTGAAA AATTGTGCCT 840 

CGTTGACGGA ACAAAAATGC GTGGAAAATG TTTAGTATAT GATAAAATAA ATGAGAGAAT 900 

GATTCGCTTG CAGTGCTAGA AATAGGCATT TTGAATAGTG AATATGTTAT AATAAGTATT 960 

AGTAGGAGGT GTTTTAGATT GGAGAAGAAA CTGACCATAA AAGACATTGC -GGAAATGGCT 1020 

CAGACCTCGA AAACAACCGT GTCATTTTAC CTAAACGGGA AATATGAAAA AATGTCCCAA 1080 

GAGACACGTG AAAAGATTGA AAAAGTTATT CATGAAACAA ATTACAAACC GAGCATTGTT 1140 

GCGCGTAGCT TAAACTCCAA ACGAACAAAA TTAATCGGTG TTTTGATTGG TGATATTACC 1200 

AACAGTTTCT CAAACCAAAT TGTTAAGGGA ATTGAGGATA TCGCCAGCCA GAATGGCTAC 1260 



WO 98/18931 



PCIYUS97/19588 



221 

CAGGTAATGA TAGGAAATAG TAATTACAGC CAAGAGAGTG AGGACCGGTA TATTGAAAGC 1320 

ATGCTTCTCT TGGGAGTAGA CGGCTTTATT ATTCAGCCGA CCTCTAATTT CCGAAAATAT 1380 

TCTCGTATCA TCGATGAGAA AAAGAAGAAA ATGGTCTTTT TTGATAGTCA GCTCTATGAA 1440 

CACCGGACTA GCTGGGTTAA AACCAATAAC TATGATGCCG TTTATGACAT GACCCAGTCC 1500 

TGTATCGAAA AAGGTTATGA ACATTTTCTC TTGATTACAG CGGATACGAG TCGTTTGAGT 1560 

ACTCGGATTG AGCGGGCAAG TGGTTTTGTG GATGCTTTAA CAGATGCTAA TATGCGTCAC 1620 

GCCAGTCTAA CCATTGAAGA TAAGCATACG AATTTGGAAC AAATTAAGGA ATTTTTACAA 1680 

AAAGAAATCG ATCCCGATGA AAAAACTCTG GTATTTATCC CTAACTGTTG GGCCCTACCT 1740 

CTAGTCTTTA CCGTTATCAA AGAGTTGAAT TATAACTTGC CACAAGTTGG GTTGATTGGT 1800 

TTTGACAATA CGGAGTGGAC TTGCTTTTCT TCTCCAAGTG TTTCGACGCT GGTTCAGCCC I860 

TCCTTTGAGG AAGGACAACA GGCTACAAAG ATTTTGATTG ACCAGATTGA AGGTCGCAAT 1920 

CAAGAAGAAA GGCAACAAGT CTTGGATTGT AGTGTGAATT GGAAAGAGTC GACTTTCTAA 1980 

AATGAAGGAA AATGACTTGC AATCTCTGTT AAGAAATAAA ATAATCCCAC CTAGAACAAG 2040 

CTAGGTGGGA TTATTTGCCT ATGAAATGAG AAATTATGGG AGCAAGCTCC TAAATCAACT 2100 

GTTTTTGATC TACTTCTTTA ACTACTTGAT AAAAGTTATA GAAGTAGGCC AAACTTGAAA 2160 

TGATGGTTAC GACTAGGAAT ATTGAAAATT TCCATTGGAC AGGGTTGGTT AAAAGTTGTG 2220 

GAAAGGATAT GAGGAGAAAG AAGAGGGCTG CGTTGAGGAC AGGTATCCGT TTTGATTGTA 2280 

TTTTCTCAAG TCCTTTATTG AGCGCAGGAA GAAAGAGGAG TAGGAGTAGT AAAACTGTAT 2340 

GAGAAATAGC TCCTGAAGTA AGGGCGAAGA AAAGGAAAAT ACTGATAAAA ACATGAATGA 2400 

TCAGTAGTCT AGCTAGTGAT TTCATAAGGC ACCTCCTAAT CCTGGTCTTT TTTAGCTCTT 2460 

GCAATACGAA GTGAGTCGAC AATATGTATC ATCACTCCGA AAAAGAAAGC TCCCAGTATA 2520 

GTTTTAAAAA TATGTTTTGT ATTTAGAAGA GAACTGATAA AATTTGGATT TTCACTTGTT 2580 

AGGGTATCAA TGAGTGGAAT TATAAAAAAT ATCACTGTTC CATAAATCGA ACCTGCTTTC 2640 

AGACCAGGAT AACGTAACTG TTTCTTTTCT TTTTTCATGA GTTTCCTCCT AATCCTCATC 2700 

TTGATTTTTC TTAGTTTTTG CAATGCGACG GGAGATGAGG AACTGTATGC TCGCTCCGAA 2760 

GAAAATAGAA CCGAGAATAC TTGATACACC ATTTCTTATA GTGAGAAGAG AATGAAAATA 2820 

GTCCTGACCT TCATCTATGA GTATCCTGAG AAGAGGAGTT ATAAAAAACA TCCATAGACC 2880 

AAAGAACAAA CCTGCTTTCA GACCTGGGTA GTGTAGTTGC TTGCTTTCTT TCTCATTCAG 2940 

CATATCTGGT TCAATGACTG TGATGCCTGT TTTTTTCATT TGGTAGGTGA CATAGCCAGA 3000 
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AGCGATGAGG GCAATCACTA AAATCAGAGG AGGATAGATT 
TTTATAGGCC AGAAGGAGTG GAATAAGATT TCCGAAAATC 
AAAGACTTGG TTCCCAATAC TATCGGCCTC ACGCCGTTTG 
AATACCGTAT GTGCGTTTGA TCAGTTTTTC AGTGAAGGTT 
CCTTTTTTAA AAATCTTCCT CCCAAAAGAG ACTGTTGAGG 
GAG AT TG AG A CAGAGTTCCA AGGTTGGATT GTACTTGTCG 
CTGTCTCGAG ACACCGATAT CCTTGGCGAG TTCGAGCTGG 
AAATTCTTTC ACACGATTCA TCTGTTCTCC TTTCTGATTT 
TATTATAGTC TTTTAAACAT AAAGTGTCAA GTATTTTTGA 
TAGTCTCCTT GTCCTATTTG TCTGACAAGT GCAAGCTGGT 
TAAGATATGA CAAAAGAATT TCATCATGTA ACGGTCTTAC 
CTTGACGTAA AGCCTGATGG TATCTACGTT GATGCGACTT 
GAGTATTTAT TAAGTAAATT AAGTGAAAAA GGCCATCTCT 
AATGCCATTG ACAATGCGCA AAAACGCTTG GCACCTTACA 
TTTATCAAGG ACAACTTCCG TCATTTACAG GCATGTTTGC 
ATTGATGGAA TTTGTTATGA CTTGGGAGTG TCTAGTCCTC 
GGTTTTTCTT ATAAAAAGGA TGCGCCACTG GACATGCGGA 
ACAGCCTATG AAGTGGTGAA CAATTATGAC TATCATGACT 
TATGGAGAGG ACAAATTCTC TAAACAGATT GCGCGTAAGA 
AAGCCGATTG AGACAACGAC TGAGTTAGCA GAGATTATCA 
GAACTCAAGA AGAAGGGGCA TCCTGCTAAG CAGATTTTCC 
AATGATGAAC TGGGAGCGGC AGATGAGTCC ATCCAGCAGG 
GATGGTAGAA TTTCAGTGAT TACCTTTCAT TCCTTAGAAG 

TTCAAGGAAG - CTTCAAC AGT- TGAAGTTCC A ~ AAAGGCTTGC 

AAGCCCAAGA TGGAATTGGT GTCCCGTAAG CCAATCTTGC 
GCCAATAACC GCTCGCACTC AGCCAAGTTG CGCGTGGTCA 
GGAAAAAGAT GGCAGAAAAA ATGGAAAAAA CAGGTCAAAT 
GGTTTTCGCG TGTGGAAAAA GCTTTTTACT TTTCCATTGC 
CCATTAGTAT TATTTTTATG CAGACCAAGC TCTTGCAAGT 
TCAATGCGCA GATAGAGGAA AAGAAGACCG AATTGGACGA 



AGAGCCACTT CTTGAGGGTA 3060 

ATCAGATAAA AGAGGATGAT 3120 

TATTCGTCAA GGGGACCAGA 3180 

TCTTTTTTCA TGAGTTTGCT 3240 

TCAGTTTGGA GGCTGCGGGC 3300 

TTTTCAATCA TATTGATAGT 3360 

GAAATACCCA ATTCCTTGCG 3420 

ATGTCGTATA TATTTGACTA 3480 

CATATTTTTT GAAGAAATAG 3540 

CGGATTTGTG ' GTAAAAT AGA 3600 

TCCACGAAAC GATTGATATG 3660 

TGGGCGGAGC AGGACATAGC 3720 

ATGCCTTTGA CCAGGATCAG 3780 

TTGAGAAGGG AATGGTGACC 3840 

GCGAAGCTGG TGTTCAGGAA 3900 

AATTAGACCA GCGTGAGCGT 3960 

TGAATCAGGA TGCTAGCCTG 4020 

TGGTTCGTAT TTTCTTCAAG 4080 

TTGAGCAAGC GCGTGAAGTG 4140 

AGTTGGTCAA ACCTGCCAAG 4200 

AGGCTATTCG AATTGAAGTC 4260 

CTATGGATAT GTTGGCTCTG 4320 

ACCGCTTGAC CAAGCAAT^G 4380 

CTTTCATCCC AGATGATCTC- -4440- - 

CAAGTGCGGA AGAGTTAGAA 4500 

GAAAAATTCA CAAGTAAGAG 4560 

ACTACAGATG CAACTTAAAC 4620 

TGTAACCACT CTTATTGTAG 4680 

GCAGAATGAT TTGACAAAAA 4740 

TGCCAAGCAA GAGGTCAATG 4800 



WO 98/18931 



PCT/US97/19S88 



AACTATTACG 
ATGAAAATAT 
CGACCAAAAA 
TATTATCTGT 
GCACTCGCTT 
CAGTTCCTGC 
CAACCTCCTA 
TTCTTTACGT 
ACATGGAAGA 
TTGGAGCAAA 
AAGCTGCAGA 
GACAATTTGC 
AGAGCTTGCT 
ACGGCATTAT 
TTTCCCAACG 
CCTTTATGGA 
CGACTTTGGT 
ATGCAGATAC 
GTAACTATGA 
ATACCTTTCC 
TTCGAGATTG 
GTTTTGCACA 
CCTGGCTTGA 
ATGAGTATGC 
GACAAGGGAT 
ACGGTGTCAT 
CTCGGAAATC 
CTCGGACTAA 
GCACAGGCAA 



TGCAGAACGT TTGAAAGAAA 
TAGAATAGCG GAGTAAGATA 
TCGGAAATCG CCGGCTGAAA 
CTTTGTTTTT GCCATTTTTT 
TGGAACAGAT TTAGCGAAGG 
CAAACGTGGG ACTATTTATG 
TAATGTCTAT GCGGTCATTG 
AGAAAAAACA CAATTTAACA 
ATCCTATGTA AGAGAGCAAC 
GGGAAATGGG ATTACCTATG 
GGTCAAGGGG ATTGATTTTA 
TTCTAGTTTT ATCGGTCTAG 
GGGAACCTCT GGAATGGAGA 
TACCTATGAA AAGGATCGTC 
AACGATGGAC GGTAAGGATG 
AACCCAGATG GATGCTTTTC 
CAGTGCTAAA ACAGGGGAAA 
AAAAGAAGGC ATTACAGAGG 
GCCAGGTTCC ACTATGAAAG 
AGGAGGAGAA GTCTTTAATA 
GGACGTTAAT GAAGGATTGA 
CTCAAGTAAC GTTGGGATGA 
TTATCTTAAT CGTTTTAAAT 
TGGTCAGCTT CCTGCGGATA 
TTCAGTGACC CAGACGCAAA 
GCTGGAGCCT AAATTTATTA 
TCAAAAAGAA ATTGTGGGAA 
CATGGTTTTG GTAGGGACGG 
GCCAACTGTA ACTGTTCCTG 
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TTGCCAATTC ACACGATTTG 
TGAAGTGGAC AAAAAGAGTA 
ACAGACGCAG AGTTGGAAAA 
TAGTCAATTT TGCGGTCATT 
AAGCTAAGAA GGTTCATCAA 
ACCGAAATGG AGTCCCGATT 
ATGAGAACTA TAAGTCAGCA 
AGGTTGCAGA GGTCTTTCAT 
TCTCGCAACC TAATCTCAAG 
CCAATATGAT GTCTATCAAA 
CAACCAGTCC CAATCGTAGT 
CTCAGCTCCA TGAAAATGAA 
GTTCCTTGAA CAGTATTCTT 
TGGGTAATAT TGTACCCGGA 
TTTATACAAC CATTTCCAGC 
AAGAGAAGGT AAAAGGAAAG 
TTCTGGCAAC AACGCAACGA 
ACTTTGTTTG GCGTGATATC 
TGATGATGTT GGCTGCTGCT 
GTAGTGAGTT AAAAATTGCA 
CTGGTGGCAG AACGATGACT 
CCCTCCTTGA GCAAAAGATG 
TTGGAGTTCC GACCCGTTTC 
ATATTGTCAA CATTGCGCAA 
TGATTCGTGC CTTTACAGCT 
GTGCCATTTA TGATCCAAAT 
ATCCTGTTTC TAAAGATGCA 
ATCCGGTTTA TGGAACCATG 
GGCAAAATGT AGCCCTCAAG 



CAATTAAACA 4860 

ATCCGTTATG 4920 

AGTCTGAGTT 4980 

ATTGGGACAG 5040 

ACCACCCGTA 5100 

GCTGAGGATG 5160 

ACGGGTAAGA 5220 

AAGTATCTGG 5280 

CAAGTTTCCT 5340 

AAAGAATTGG 5400 

TACCCAAACG 54 60 

GATGGAAGCA 5520 

GCAGGGACAG 5580 

ACAGAACAAG 5640 

CCCCTCCAGT 5700 

TACATGACAG 5760 

CCGACCTTTG 5820 

CTTTACCAAA 5880 

ATTGATAATA 5940 

GATGCCACGA 6000 

TTTTCTCAAG 6060 

GGAGATGCTA 6120 

GGTTTGACGG 6180 

AGCTCATTTG 6240 

ATTGCTAATG 63O0 

GATCAAACTG 6360 

GCTAGTCTAA 6420 

TATAACCACA '6480 

TCTGGTACGG 6540 
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CTCAGATTGC TGACGAGAAA AATGGTGGTT ATCTAGTCGG GTTAACCGAC TATATTTTCT 6600 

CGGCTGTATC GATGAGTCCG GCTGAAAATC CTGATTTTAT CTTGTATGTG ACGGTCCAAC 6660 

AACCTGAACA TTATTCAGGT ATTCAGTTGG GAGAATTTGC CAATCCTATC TTGGAGCGGG 6720 

CTTCAGCTAT GAAAGACTCT CTCAATCTTC AAACAACAGC TAAGGCTTTA GAGCAAGTAA 6780 

GTCAACAAAG TCCTTATCCT ATGCCTAGTG TCAAGGATAT TTCACCTGGT GATTTAGCAG 6840 

AAGAATTGCG TCGCAATCTT GTACAACCCA TCGTTGTGGG AACAGGAACG AAGATTAAAA 6900 

ACAGTTCTGC TGAAGAAGGG AAGAATCTTG CCCCGAACCA GCAAGTCCTT ATCTTATCTG 6960 

ATAAAGCAGA GGAGGTTCCA GATATGTATG GTTGGACAAA GGAGACTGCT GAGACCCTTG 7020 

CTAAGTGGCT CAATATAGAA CTTGAATTTC AAGGTTCGGG CTCTACTGTG CAGAAGCAAG 7080 

ATGTTCGTGC TAACACAGCT ATCAAGGACA TTAAAAAAAT TACATTAACT TTAGGAGACT 7140 

AATATGTTTA TTTCCATCAG TGCTGGAATT GTGACATTTT TACTAACTTT AGTAGAAATT 7200 

CCGGCCTTTA TCCAATTTTA TAGAAAGGCG CAAATTACAG GCCAGCAGAT GCATGAGGAT 7260 

GTCAAACAGC ATCAGGCAAA AGCTGGGACT CCTACAATGG GAGGTTTGGT TTTCTTGATT 7320 

ACTTCTGTTT TGGTTGCTTT CTTTTTCGCC CTATTTAGTA GCCAATTCAG CAATAATGTG 7380 

GGAATGATTT TGTTCATCTT GGTCTTGTAT GGCTTGGTCG GATTTTTAGA TGACTTTCTC 7440 

AAGGTCTTTC GTAAAATCAA TGAGGGGCTT AATCCTAAGC AAAAATTAGC TCTTCAGCTT 7500 

CTAGGTGGAG TTATCTTCTA TCTTTTCTAT GAGCGCGGTG GCGATATCCT GTCTGTCTTT 7560 

GGTTATCCAG TTCATTTGGG ATTTTTCTAT ATTTTCTTCG CTCTTTTCTG GCTAGTCGGT 7620 

TTTTCAAACG CAGTAAACTT GACAGACGGT GTTGACGGTT TAGCTAGTAT TTCCGTTGTG 7680 

ATTAGTTTGT CTGCCTATGG AGTTATTGCC TATGTGCAAG GTCAGATGGA TATTCTTCTA 7740 

GTGATTCTTG CCATGATTGG TGGTTTGCTC GGTTTCTTCA TCTTTAACCA TAAGCCTGCC 7800 

AAGGTCTTTA TGGGTGATGT GGGAAGTTTG GCCCTAGGTG GGATGCTGGC AGCTATCTCT 7860 

ATGGCTCTCC ACCAAGAATG GACTCTCTTG ATTATCGGAA TTGTGTATGT TTTTGAAACA 7920 

ACTTCTGTTA TGATGCAAGT CAGTTATTTC AAACTGACAG GTGGTAAACG TATTTTCCGT 7980 

ATGACGCCTG TACATCACCA TTTTGAGCTT GGGGGATTGT CTGGTAAAGG AAATCCTTGG 8040 

AGCGAGTGGA AGGTTGACTT CTTCTTTTGG GGAGTGGGAC TTCTAGCAAG TCTCCTGACC 8100 

CTAGCAATTT TATATTTGAT GTAAGAATGG CACCCTGATG TTTCAGGG 8148 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9909 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 

{xi). SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TACTCCACCC TTAATATCCG TTCCTGTAAA TACTTTACCG CTTTTAAGTT CATAGAATTG 60 

AACTTTTAAA TGCTTGTCTT CAAGCATCTT TTCCATCCAA TTTTTAGGAG TTTGACCAGC 120 

TTTAAATAAA AACCTTGCTG GGGTGATTAG TATAGATTTA TCTGCGATTT TATAAGCTTC 180 

ATCAATAAAA TAGTGATATA TCGGCTCATC TCTGGCTTCT CCTGTTTCCT GATACGGAGG 240 

ATTTCCTATC ACGACATCAA ATTTCATTTC ACTTTCCTCG CTAGATAGGC GCTCAAAACC 300 

TATCATTCTA TTCTTTTTCC AGTCTTTGAT ATGGGTTTTA GATTCTTCTA CTTCTTGGAC 360 

TTCTAGCTCA TCCGCAAACA AACTCAATTG TTGAGATTGC TTTTGTTTAG CTGAATAAGG 420 

ACTACTTTTT TTCAATCCAT CCATCTGAAA GACATTGTAA GAGATAATAG TCGCAATTTC 480 

TTTCTTTTGC TCTAATGTTG GTTGATTTCC AGTCTTAGCT AGATAATAGT CCTCAAAAGT 540 

TGCCAAAAGA TTCTCACGCG CCAAAAGGAG AGAATCTCCT TGATACTCAT AACCATACGA 600 

AGCATGATAA GCATCTTTTA CAAGTTTATA AAATGTGACT TCATCTGAAA CCTCACGACT 660 

AATCCGTTGC AGTTTTCTAT CAACAAAACC AACTCGCTCA GATAATGGAA TTTCCTCACC 720 

AGTTACGGTA TCATATCTCG TTACCATATA AGGTGCTTCA CCACAAGTTA CCTCTAACCA 780 

TCGTAAGTCC ACATACTCCT CAAGACTTAA CGAGCCTAAT TTCGATTCTA CATATCCATT 840 

TTGCTTTGCG ACCAACCACG TTGGTGTAAA CACTTCTGCC CTTATTTTTG TCCGATCTTT 900 

TTGTTCATAT TTGGATTTTT CAGATCTGGG CTGAATCAAG TTGGCAAAGT TTCCAGTAAC 960 

CTTACTTGGA TTGATGCGAT CACTTGGAGC AAATCCCTTT CCTAACAATT CATAAGAATG 1020 

CGTAnGCCAA ACAATTGATT TCTTTGTCGT TCGATCTTTT AAAAGAATTT TTAATAAGTC 1080 

AGCCGATTCT TTAGCCAAAC TTTCTTCACT AATATCTATT GTCATCAGCA ACCTCTCTTA 1140 

TATTGTAAGC CCTATTATAT CATATTTTAA AGAATGAAAA TTTACTTGAA AAAAGTAATT 1200 

CAATAAATAT CTCTCCGATG ACCAACTTCT AGAGTAGCAA CGACTAATTC ATCATCTACA 1260 

ATTTGTACGA TAACTCGATA ATTACCAATT CTATAGCGCC ATTGACCAAC GCGATTACCA 1320 

ACCAAAGCCT TTCCGTGTCG TCTTGGGTCT TCCAAAACAT TGGTTTGTAA ATAGTTTGTA 1380 

ATTAGCTTCT GCGTATAACG GTCCAATTTT TTCAATTGCT TGATAAAACG TCTTGTTGGA 1440 

ACTAATTTAT ACAAATTATT CATCCTTCAA GCCTAAATCA TGCATCATTT CTTCCCAAGT 1500 

AATGGGTTCA ACTCCTTTTT CCAAGTCTTC TAAATACTCT TGATAGGCTA AATCTGCCAC 1560 
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ACGAGCATCG TATTCATCTT CTAGGGCTTC AAGAGTTTTG GTGCGAATAA GTTCCGAAAG 1620 

GGAAACTCCT TCAAACTTAG CCATTGCTTT CATAAATGTT TTATCAGCTT CAGAAACTTT 1680 

TAATGTAATA GTAGTCATCT TTTGTGCTCC CTTTTTTAAT GGTAACACCA TTGTATTACT 1740 

TTTTAGGTGT TCAGTCAATA TAAAAAGAAC ACCTTCTCAG CGTTCTTTCT ATATCTCTGT 1BOO 

CAATGGTGTT GCGGTATCTG GTGAGGTATC ATAAACCTTA AAGTCTACTC CGACTCCCAG I860 

ATCAGCTTGA GCCAGCTGAT TGACCATGGT CATATGAGCC AGTTCCTTGA TATTGTTTTC 1920 

CTTAGATAAA TGCCCAAGGT AAATCTTCTT AGTACGATTT CCTAGCGTCC GAATCATAGC 1980 

TTCAGCACCG TCCTCGTTAG AAAGGTGACC AAGGTCAGAT AGGATTCGTT GTTTGAGTCG 2040 

CCAAGCGTAA GAACCTGATC GCAAAATCTC TACATCATGG TTGGGCTCGA TAAGATAACC 2100 

ATCCGCATTT TCGACAATGC CCGCCATACG GTCACTGACA TAACCTGTAT CTGTCAAGAG 2160 

GACAAAACTC TTATCATCCT TCATAAAGCG ATAGAACTGC GGTGCGACTG CATCATGGCT 2220 

TACACCAAAA CTCTCGATGT CGATATCTCC AAAGGTTTTG GTTTTACCCA TTTCAAAAAT 2280 

ATGCTTTTGC GAAGAATCCA CCTTGCCAAG ATATTTACTA TTTTCCATAG CTTGCCAGGT 2340 

CTTTTCATTG GCATAAAGAT CCATACCATA CTTGCGAGCC AAAACGCCTA CTCCATGGAT 2400 

ATGATCTGAA TGCTCATGGG TAATCAAGAT GGCATCCAGG TCTTCTGGCT TACGGTTAAT 2460 

TTCAGCTAGC AGACTGGTAA TTTTCTTGCC AGACAAGCCT GCATCTACTA AAAGCTTCTT 2520 

TTTTGAGGTT TCCAGATAAA AAGAATTTCC ACTGGAACCC GACGCTAAAA TACTGTATTT 2580 

AAAGCCTATT TCACTCATTC TAGTCTTCTA CTTCATCCTC CCATACTTCT TCTTTCACTG 2640 

CATCCTTATC ATAAGGGAGT ACAATGGTAA AGGTTGAACC CTTGCCGTAT TCACTCTTGG 2700 

CCCAAATAAA GCCCTTATGT TGTTTGATAA TTTCTTTAGC GATAGACAGT CCTAGACCTG 2760 

TACCACCTTG TGCACGACTT CTAGCACGAT CCACACGATA GAAACGGTCA AAGATACGTG 2820 

GTAAATCCTG CTTAGGAATC CCCAAACCGT GGTCAGAAAT GGATAAAATC ATCTGGTCTT 2880 

CAGTTGTCTT CATTCTGACA GTGATTTTAC CCCCATCTGG CGAATACTTA ATAGCATTAT 2940 

TTAAAATATT GTCGACAACC TGCGTCATGT TATCTGTATC AATTTCCATC GAGATAGAAT 3000 

TGATGGGATA ATCTCTCACC AACTCATATT TTTTCTCCTT TTCCTGTCCT TTCATCTTGT 3060 
CAAAACGATT GAGGATAAAG GTAATAAAAG CAGTGAAGTT AATCAGTTCC ACATCTAGGT . 3120 

GACTGGTAGC ATTATCAATA CGTGAAAGAT GGAGGAGATC CGTCACCATG CGCATCATAC 3180 

GGTTGGTCTC ATCAAGAGAA ACCTTGATAA AGTCTGGTGC TACAGTTTCA CACAAAGCCC 3240 

CCTCATCCAA GGCTTCAAGA TAGGATTTTA CGCTAGTCAG AGGAGTCCGT AACTCATGGC 3300 

TAACATTGGA AACAAAGAGT CTTCGTTCGC GTTCTTCCTT CTCCTGCTCC GTCGTATCAT 3360 
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GCAAAACAGC CACCAAACCT GAAATAAAGC CAGACTCTCG ACGTATCAAG GCAAAGCGAA 3420 

CTCGAAGGTT CAAATATTCG CCATTGATAT CTTGGGAATC TAGCAACAAT TCTGGACTTT 3480 

GGGTAATCAA ATCACGCAAT TCATAGTTTT CTTCTATCTT GAGCAATTCC AAAATGCTTC 3540 

TATTCAGAAC ATCTTCCTTA ACCAACCCCA GTTGCTTCTT GGCTGTATCG TTAATCATGA 3600 

TAATCTGACC CCGACGGTTA GTCGCAAGAA CCCCATCTGT CATATAAAAC AGAATACTAT 3660 

TTAGCCTCTT ACTCTCTTGT TCTAGATTTT CCTGAGTGAG ACGAATAACC TCCGACAAGT 3720 

CATTCAAATT ATTGGTAATA TTGGTGATTT CAGACCCACC TTGCATATCA AGAACCTTGG 3780 

AATAATCTCC TGCAATCAAA TCTTTAACCT TTTGATTGAC TTGCTTCAAC TGAATATTAT 3840 

CACGTCTATT TTCCAGTAAT AAGAGGGTCA CAACAAGGAT GAAACCTAAC AAAATCAGGA 3900 

TAAAGATAAA ATCTCTGGTA AAAATGGTTT GTTTCAGTAA ATCAAGCATT ATTTCTCATG 3960 

TAATACCCTA CACCACGGCG CGTCAAGATA TACTCTGGTC GGCTGGGCGT ATCTTCAATC 4020 

TTCTCACGCA GACGTCGTAC AGTCACATCA ACTGTACGGA CATCACCAAA ATAGTCATAA 4080 

CCCCAGACAG TCTCAAGCAA GTGTTCGCGC GTGATGACTT GACCTGTATG CGATGCTAAA 4140 

TGATACAAAA GCTCAAATTC ACGATGGGTT AAGTCTAGTT CTTCGCCATA TTTTTTAGCC 4200 

ACGTAGGCGT CTGGAACAAT TTCTAAATCC CCAATTTGGA TAGGTTGAGG TTTACTATCT 4260 

GCTTCCTGAC CATCTACTGG CATAGGTTGA GAACGACGCA GAAGAGCTTT AACACGCGCC 4320 

TGCAACTCAC GATTGGAGAA GGGTTTTGTT ACATAGTCAT CTGCCCCAAG TTCCAAACCG 4380 

ATAACCTTAT CAAATTCACT ATCTTTGGCT GAAAGCATAA GAATGGGCAC ACTGCTTGTC 4440 

TTACGAATGG TCTTAGCAAC TTCTAAACCA TCAATTTCTG GAAGCATCAA ATCCAGAATA 4500 

ATAATATCTG GTTGCTCTGC TTCAAATTGC TCTAGCGCTT CACGACCATT AAAAGCAGTT 4560 

ACAACTTCGT AACCTTCCTT GGTCATATTA AACTTGATAA TATCCGAGAT TGGTTTCTCA 4620 

TCATCTACAA TTAGTATTTT TTTCATATGT TCACCTTTTT CTCTACTATT ATACCAAAAA 4680 

AATAGTCAGA AGACACAATA GCTAGTCTTG GCTACTGTCT AAGTTGGCTT GTGCATAAAC 4740 

CTGCCAGATT TTTTGTTGGG GTTTGGCAAG TGGGTAATTC TTGAATTCTT CTGGTGAAAG 4800 

CCAGCGAACT TCCCTATCTG AAAAATCATG GAAGTCACTC ACCTGACCTG CTACAATCTG 4860 

TACATGCCAT TTTCGATGAC TAAAAACATG CTGGACTGTA TCAAAACAAA CATCAAGCCA 4920 

ATCAACATCT AGGTCATAGT CCTGCTGGAA ACTCTCTTCT GGACTGGGAC CAAAGTTCAC 4980 

ACTTTCTTCC GCAACCTGAT GAAAGAGGTC AAACTGCTCT TCTTGCGAAA AGTTATCAAC 5040 

TTCTATAAAG GGGAAATGCC AAAAACCTGC CAAGAGCTTT TCGCTTTCAT TTTTTTCAAG 5100 
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TAAAAATTGT 


CCTTGAGAAT 


TTTTCACAAC 


TAAGGCTTTA 


AGATAAATAG 


GAACCGGCTT 


5160 


TTTCTTAGGA 


GATTTAATTG 


GATAACGGTC 


CATGGTTCCA 


TTCTGATATG 


CCGCACTAAA 


5220 


GTCCTTGACT 


GGGCTTTCTT 


CAGGTCTGGG 


ATTTACAGGA 


GACTCAATAT 


CAGACCCTAA 


5280 


GTCCATCAAG 


GCTTGATTAA 


AATCACCCGG 


ACGATCCGGA 


TTAATCAAGA 


TCTCCATCAT 


5340 


TGCCTGAAAA 


ATTTTTCGAT 


TACTTGGAAT 


CCCAATATCG 


TGGTTGACTT 


CAAACAGACG 


5400 


CGCCAAGACC 


CGCATGACAT 


TACCATCTAC 


AGCTGGCTCA 


GGCAAGTTAA 


AAGCAATACT 


5460 


GGAAATGGCT 


CCTGCTGTGT 


AAGGTCCAAT 


CCCTTTCAAG 


CTGGAAATTC 


CTTCATAGGT 


5520 


ATTTGGAAAT 


TGGCCACCAA AGTCAGTCAT 


AATCTGCTGG 


GCTGCAGCCT 


GCATATTGCG 


55B0 


AACTCGAGAA TAATAGCCCA AGCCCTCCCA AGCTTTCAGT AAACTCTCCT CAGGCGCAGT 


5640 


TGCCAGACTT TCGACAGTTG GAAACCAGTC CAAAAATCTT TCGTAGTAAG GGATAACTGT 


5700 


ATCCACCCTG 


GTCTGCTGAA 


GCATGATTTC 


AGATACCCAG 


ATGTGATAAG 


GATTTTTACT 


5760 


TCTCCTCCAA GGCAAATCTC 


TTTTGTTTTC 


ATCATACCAA 


GCGAGAAGTT 


TCTCACGGAA 


5820 


AGAAATGACT 


TTCTCCTCCG 


GCCACATGAC 


GATACCGTAT 


TCTTTCAAAT 


CTAACATATC 


5880 


TCTAGTATAA 


CACAGAAGGT 


TTCACCTGTC 


TTTGTATCTG 


ATTTATAATA 


TTTTCAATAG 


5940 


ATAGTATATA 


ACTTTTCTAT 


CTACTTATAC 


TCAATGAAAA 


TCAAAGAGCA 


AACTAGGAAG 


6000 


CTAGCCGCAG 


GTTGCTCAAA 


ACACTGTTTT 


GAGGTTGTGG 


ATAGAACTGA 


CAGAGTCAGT 


6060 


ATCATATAcT 


ACGGCAAGGT 


GAAGCTGACG 


TAGTTTGAAG 


AGATTTTCGA 


AGAGTATAAA 


6120 


TCTTATTGAT 


GAACTGCTTG 


CAGTCTGAGA 


AAAAATGAGC 


TTGGATATTA 


TTTCCAAACT 


6180 


CACTTAAAGT 


CAATTTCAAT 


CCACTAGAAC 


AAGCCTAGTA 


CAGTTCCATC 


GCTTTCAACA 


6240 


TCCATGTTGA 


GAGCTGCTGG 


ACGTTTTGGA 


AGACCTGGCA 


TGGTCATAAC 


ATCACCAGTT 


6300 


AAGGCAACGA 


TGAAGCCTGC 


ACCTAATTTT 


GGTACCAATT 


CACGAATGGT 


AATTTCAAAG 


6360 


TTTTCTGGTG 


CTCCAAGCGC 


ATTTGGATTG 


TCTGAGAAAC 


TGTATTGAGT 


TTTAGCCATA 


6420 


CAGATTGGCA ATTTGTCCCA ACCGTTTTGA ACGATTTGAG CAATTTGTGT TTGAGCTTTC 


6480 


TTCTCAAAGT 


TCACTTTGCT ACCACGATAG 


ATTTCAGTGA 


CAATTTTTTC 


AATCTTTTCT 


6540 


TGGACAGAAA 


GGTCATTATC 


ATACAAACGT 


TTATAGTTAG 


CTGGATTTTC 


AGCAATTGTC 


6600 


TTAACAACTG 


TTTCGGCAAG 


TGCTACTCCA 


CCTTCTGCTC 


CATCAGCCCA 


GACACTAGCC 


6660 


AATTCAACTG 


GTACATCGAT 


TGAGGCACAG 


AGTTCTTTTA 


AGGCTGCAAT 


TTCAGCTTCT 


6720 


GTATCAGATA CAAATTCGTT AATAGCTACA ACTGCTGGAA TACCGAACTT ACGGATATTT 


6780 


TCAACGTGGC 


GTTTCAAGTT 


AGCAAAACCT 


GCACGAACTG 


CCTCTACATT 


TTCTTCAGTC 


6840 


AGAGCGTCTT 


TAGCCACACC 


ACCATTCATC 


TTAAGGGCAC 


GAAGGGTTGC 


GACAATAACA 


6900 
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ACTGCATCTG 


GAGATGTTGG 


CAAGTTTGGT 


GTCTTGATAT 


CAAGGAATTT 


CTCAGCACCA 


6960 


AGGTCCGCAC 


CAAAACCAGC 


TTCAGTAACA 


GTGTAATCAG 


CCAAGTGAAG 


GGCTGTTGTC 


7020 


GTCGCCAAAA 


CAGAGTTACA 


GCCATGAGCG 


ATATTGGCAA 


ATGGACCACC 


GTGTACAAAG 


7080 


GCAGGTGTAC 


CGTAAATTGT 


CTGAACCAAG 


TTTGGCTTAA 


TAGCATCCTT 


CAAAATCAAA 


7140 


GCCAAGGCAC 


CCTCAACCTG 


CAAATCACCT 


ACAGAAACAG 


GCGTACGGTC 


ATAGCGATAA 


7200 


CCAATAACGA 


TATTCGCCAA 


ACGACGTTTC 


AAGTCCTCGA 


TGTCCGTTGC 


CAAGCAAAGA 


7260 


ATTGCCATGA 


TTTCTGAAGC 


AACTGTAATA 


TCAAAACCAT 


CCTCACGTGG 


AATACCGTTT 


7320 


AGAGGACCAC 


CAAGACCAAC 


AGTCACATGG 


CGGAGCGTAC 


GGTCGTTCAA 


GTCCACAACG 


7380 


CGTTTCCAGA 


GGATACGACG 


TTGATCAATT 


CCCAGCTCAT 


TCCCTTGGTG 


CAAGTGGTTG 


7440 


TCAATCAAGG 


CAGAAAGGGC 


ATTGTTGGCA 


GTTGTAATAG 


CATGCATATC 


TCCAGTAAAG 


7500 


TGGAGGTTGA 


TGTCTTCCAT 


TGGCAGAACT 


TGTGCATACC 


CACCACCAGC 


AGCACCACCC 


7560 


TTGATCCCCA 


TGACTGGACC 


AAGAGACGGT 


TCGCGGATAG 


CAATCATGGT 


TTTCTTGCCA 


7620' 


ATCTTGTTCA 


AGGCATCCGC 


AAGACCAATG 


GTAAGCGTCG 


ACTTTCCTTC 


ACCTGCAGGT 


7680 


GTTGGGTTGA 


TGGCAGTAAC 


CAAGATCAAT 


TTACCGACTG 


GATTGCTCTC 


AACTGCACGA 


7740 


ATTTTATCAA 


AGCTGAGTTT 


AGCCTTGTAC 


TTTCCGTACA 


ACTCCAAATC 


GTCATAAGAA 


7800 


ATACCAAGTT 


TCTCTACAAC 


ATCAACAATT 


GGCTTCAACT 


CAATACTCTG 


TGCGATTTCA 


7860 


ATATCTGTTT 


TCATTCAAAA 


TTCCTCTAAC 


CTCTTATATG 


ATAATTCATT 


ATATCACAAA 


7920 


ACAAGATTTT 


TAACATCCTA 


AAACTCTCTA 


AACGTTCGTA 


AATATCTCTG 


TTTTTAAGAC 


7980 


TTTTAGAGTC 


CTTTCTTAAA 


TTTTATATGG 


CTTTATAGTT 


TGAAACTATA 


ATAAATCTTC 


8040 


GTTTTTACCA 


AAAATTTATC 


ACTTTCATTT 


TACTTACCGC 


TTATTTTTGT 


GTACAATAGT 


8100 


GCTATGAAAA 


TTTTAGTTAC 


ATCGGGCGGT 


ACCAGTGAAG 


CTATCGATAG 


CGTCCGCTCT 


8160 


ATCACTAACC 


ATTCTACAGG 


TCACTTGGGG 


AAAATTATCA 


CAGAGACTTT 


GCTTTCTGCA 


8220 


GGGTATGAAG 


TTTGTTTAAT 


TACGACAAAA CGAGCTCTGA AGCCAGAGCC 


TCATCCTAAC 


8280 


CTAAGTATTC 


GAGAAATTAC 


CAATACCAAG 


GACCTTCTAA 


TAGAAATGCA 


AGAACGTGTT 


8340 


CAGGATTATC 


AGGTCTTGAT 


CCACTCAATG 


GCTGTTTCTG 


ACTACACTCC 


TGTTTATATG 


8400 


ACAGGGCTTG 


AGGAAGTTCA 


GGCTAGCTCC 


AATCTAAAAG 


AATTTTTAAG 


CAAGCAAAAT 


8460 


CATCAGGCCA AGATTTCTTC 


AACTGATGAG 


GTTCAGGTTT 


TGTTCCTTAA 


AAAGACACCC 


8520 


AAAATCATAT 


CCCTAGTCAA 


GGAATGGAAT 


CCTACTATTC 


ATCTGATTGG 


TTTCAAACTG 


8580 


CTGGTTGATG 


TTACCGAAGA 


TCATCTGGTT 


GACATTGCAC 


GAAAAAGTCT 


TATCAAGAAT 


8640 
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CAAGCAGATT TAATCATCGC GAATGACCTG ACTCAAATTT CAGCAGATCA GCACCGAGCT 8700 

ATATTTGTTG AGAAAAATCA GCTTCAAACA GTCCAGACTA AAGAAGAAAT TGCAGAACTC 8760 

CTCCTTGAAA AAATTCAAGC CTATCATTCT TAGAAAGGAA AACTATGGCA AACATTCTCT 8820 

TGGCTGTAAC GGGTTCAATC GCCTCTTATA AGTCGGCAGA TTTAGTCAGT TCTCTAAAAA 8880 

AACAAGGCCA TCAAGTCACT GTCTTAATGA CTCAGGCTGC TACAGAGTTT ATCCAACCTT 8940 

TGACACTACA GGTACTCTCA CAGAATCCTG TCCACTTGGA TGTCATGAAG GAACCCTATC 9000 

CTGATCAGGT CAATCATATC GAACTTGGAA AAAAAGCAGA TTTATTTATC GTGGTACCTG 9060 

CAACTGCTAA CACTATTGCA AAACTAGCTC ACGGATTTGC GGACAACATG GTAACCAGTA 9120 

CAGCTCTAGC CCTACCAAGT CATATTCCCA AACTAATAGC TCCTGCTATG AATACAAAAA 9180 

TGTATGACCA TCCAGTAACT CAGAATAATC TGAAAACATT AGAAACTACG GCTATCAGCT 9240 

GATTGCTCCT AAGGAATCCC TACTAGCTTG TGGAGACCAC GGACGAGGAG CTTTAGCTGA 9300 

CCTCACAATT ATTTTAGAAA GAATAAAGGA AACTATCGAT GAAAAAACGC TCTAATATTG 9360 

CACCCATTGC TATCTTTTTT GCTACCATGC TCGTGATACA CTTTCTGAGC TCACTTATCT 9420 

TTAACCTTTT TCCATTTCCA ATCAAACCGA CCATTGTTCA TATTCCTGTC ATTATTGCCA 9480 

GCATTATTTA TGGTCCACGA GTTGGGGTTA CACTTGGATT TTTGATGGGA TTACTTAGCT 9540 

TGACGGTTAA CACGATTACG ATTCTACCGA CAAGCTACCT CTTCTCTCCC TTCGTACCAA 9600 

ACGGAAACAT CTACTCAGCT ATCATTGCCA TCGTCCCACG TATTTTGATT GGTTTAACTC 9660 

CTTACTTAGT CTATAAACTG ATGAAAAACA AGACTGGTCT GATTTTAGCT GGAGCCCTTG 9720 

GTTCcTTGAC AAATACTATC TTTGTCCTTG GAGGAATCTT CTTCCTATTT GGAAATGTTT 9780 

ATAATGGAAA TATCCAACTT CTTCTGGCAA CCGTTATCTC AACAAATTCA ATTGCTGAAT 9840 

TGGTCATTTC TGCAATTCTA ACCCTAGCCA TTGTTCCACG ACTACAAACC TTGAAAAAAT 9900 

AAAAACAGG 9909 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1126 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TAATTTTCAT ATAATAGTAA AATAGAATGT GTGATTCAAT AATCACCTCA AATAGAAAGG 60 

AAATTCTATG TCAAATCTAT CTGTTAATGC AATTCGTTTT CTAGGTATTG ACGCCATTAA 120 
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TAAAGCCAAC 


TCAGGTCATC 


CAGGTGTGGT 


TATGGGAGCG 


GCTCCGATGG 


CTTACAGCCT 


180 


CTTTACAAAA 


CAACTTCATA 


TCAATCCAGC 


TCAACCAAAC 


TGGATTAACC 


GCGACCGCTT 


240 


TATTCTTTCA 


GCAGGTCATG 


GTTCAATGCT 


CCTTTATGCT 


CTTCTTCACC 


TTTCTGGTTT 


300 


TGAAGATGTC 


AGCATGGATG 


AGATTAAGAG 


TTTCCGTCAA 


TGGGGTTCAA 


AAACACCAGG 


360 


TCACCCAGAA 


TTTGGTCATA 


CGGCAGGGAT 


TGATGCTACG 


ACAGGTCCTC 


TAGGGCAAGG 


420 


GATTTCAACT 


GCTACTGGTT 


TTGCCCAAGC 


AGAACGTTTC 


TTGGCAGCCA 


AATATAACCG 


480 


TGAAGGTTAC 


AATATCTTTG 


ACCACTATAC 


TTACGTTATC 


TGTGGAGACG 


GAGACTTGAT 


540 


GGAAGGTGTC 


TCAAGCGAGG 


CAGCTTCATA 


CGCAGGCTTG 


CAAAAACTTG 


ATAAGTTGGT 


600 


TGTTCTTTAT 


GATTCAAATG 


ATATCAACTT 


GGATGGTGAG 


ACAAAGGATT 


CCTTTACAGA 


660 


AAGTGTTCGT 


GACCGTTACA 


ATGCCTACGG 


TTGGCATACT 


GCCTTGGTTG 


AAAATGGAAC 


720 


AGACTTGGAA 


GCCATCCATG 


CTGCTATCGA 


AACAGCAAAA 


GCTTCAGGCA 


AGCCATCTTT 


780 


GATTGAAGTG AAGACGGTTA 


TTGGATACGG 


TTCTCCAAAC 


AAACAAGGAA 


CTAATGCTGT 


840 


ACACGGCGCC 


CCTCTTGGAG 


CAGATGAAAC 


TGCATCAACT 


CGTCAAGCCC 


TCGGTTGGGA 


900 


CTACGAACCA 


TTTGAAATTC 


CAGAACAAGT 


ATATGCTGAT 


TTCAAAGAAC 


ATGTTGCAGA 


960 


CCGTGGCGCA TCAGCTTATC 


AAGCTTGGAC 


TAAATTAGTT 


GCAGATTATA 


AAGAAGCTCA 


1020 


TCCAGAACTG 


GCTGCAGAAG 


TAGAAGCCAT 


CATCGACGGA 


CGTGATCCAG 


TCGAAGTGAC 


1080 


TCCAGCAGAC 


TTCCCAGCTT 


TAGAAAATGG 


TTTTtCTCAA 


GCAACT 




1126 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2520 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

CCGGCAACAA AAAAGAAAAA ATCAACAGTT AAAAAAAATC TAGTCATCGT GGAGTCGCCT 60 

GCTAAGCCAA GACGATTGAA AAATATCTAG GCAGAAACTA CAAGGTTTTA GCCAGTGTCG 120 

GGCATATCCG TGATTTGAAG AAATCCAGTA TGTCCGTCGA TATTGAAAAT AATTATGAAC 180 

CGCAATATAT TAATATCCGA GGAAAAGGCC CTCTTATCAA TGACTTGAAA AAAGAAGCTA 240 

AAAAAGCTAA TAAAGTTTTT CTCGCGAGTG ACCCGGACCG TGAAGGAGAA GCGATTTCTT 300 

GGCATTTGGC CCATATTCTC AACTTGGATG AAAATGATGC CAACCGTGTG GTCTTCAATG 360 
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AAATCACCAA GGATGCAGTC AAAAATGCTT TTAAAGAACC TCGTAAGATC GATATGGACT 420 

TGGTCGATGC CCAACAAGCT CGTCGGATCT TGGATCGCTT GGTAGGGTAT TCGATTTCGC 480 

CTATTTTGTG GAAGAAGGTC AAGAAGGGCT TGTCAGCAGG TCGCGTTCAG TCCATTGCCC 540 

TTAAACTCAT CATTGACCGT GAAAATGAAA TCAATGCCTT CCAGCCAGAA GAATACTGGA 600 

CAGTTGATGC TGTCTTTAAA AAGGGAACCA AACAATTTCA TGCTTCCTTC TATGGAGTAG 660 

ATGGTAAAAA GATGAAACTG ACCAGCAATA ACGAAGTCAA GGAAGTCTTG TCTCGTCTGA 720 

CGAGTAAAGA CTTTTCAGTA GATCAGGTGG ATAAGAAAGA GCGCAAGCGC AATGCTCCTT 780 

TACCCTATAC CACTTCATCT ATGCAGATGG ATGCTGCCAA TAAAATCAAT TTCCGTACTC 840 
GAAAAACCAT GATGGTTGCC CAACAGCTCT ATGAAGGAAT TAATATCGGT TCTGGTGTTC ' 900 

AAGGTTTGAT TACCTATATG CGTACCGATT CGACTCGTAT CAGTCCTGTA GCGCAAAATG 960 

AGGCGGCAAG CTTCATTACG GATCGTTTTG GTAGCAAGTA TTCTAAGCAC GGTAGCAAGG 1020 

TCAAAAACGC ATCAGGTGCT CAGGATGCCC ATGAGGCTAT TCGTCCGTCA AGTGTCTTTA 1080 

ATACACCAGA AAGCATCGCT AAGTATCTGG ACAAGGATCA GCTTAAGCTA TATACCCTTA 1140 

TCTGGAATCG TTTTGTGGCT AGCCAGATGA CAGCGGCCGT TTTTGATACC ATGGCTGTTA 1200 

AATTGTCTCA AAAAGGGGTT CAATTTGCTG CCAATGGTAG TCAGGTTAAG TTTGATGGTT 1260 

ATCTTGCCAT TTATAATGAT TCTGACAAGA ATAAGATGTT ACCGGACATG GTTGTTGGAG 1320 

ATGTGGTCAA ACAGGTCAAT AGCAAACCAG AGCAACATTT CACCCAACCG CCTGCCCGTT 1380 

ATTCTGAAGC AACACTGATT AAAACCTTAG AGGAAAATGG GGTTGGACGT CCATCAACCT 1440 

ACGCGCCAAC CATTGAAACC ATTCAGAAAC GTTATTATGT TCGCCTGGCA GCCAAACGTT 1500 

TTGAACCGAC AGAGTTGGGA GAAATTGTCA ATAAGCTCAT CGTTGAATAT TTCCCAGATA 1560 

TCGTAAACGT GACCTTCACA GCTGAAATGG AAGGTAAACT GGATGATGTC GAAGTTGGAA 1620 

AAGAGCAGTG GCGACGGGTC ATTGATGCCT TTTACAAACC ATTCTCTAAA GAAGTTGCCA 1680 

AGGCTGAAGA AGAAATGGAA AAAATCCAGA TTAAGGATGA ACCAGCTGGA TTTGACTGTG 1740 

AAGTGTGTGG CAGTCCAATG GTCATTAAAC TTGGTCGTTT TGGTAAATTC TACGCTTGTA 1800- 

GCAATTTCCC AGATTGCCGT CATACCCAAG CAATCGTGAA AGAGATTGGT GTTGAGTGTC 1860 

CAAGCTGTCA TCAGGGACAA ATTATTGAGC GAAAAACCAA GCGTAATCGC CTATTCTATG 1920 

GTTGCAATCG CTATCCAGAA TGTGAATTTA CCTCTTGGGA CAAGCCTGTT GGTCGTGACT 1980 
GTCCAAAATG TGGCAACTTC CTCATGGAGA AAAAAGTCCG TGGTGGTGGC AAGCAGGTTG ' 2040 

TTTGTAGCAA AGGCGACTAC GAGGAAGAAA AGATGGCTCT TTGTCAACTG TAGTGGGTTG 2100 

AAGTCAGCTA AGCTCGAGAA AGGACAAATT TTGTCCTTTC TTTTTTGATA TTCAGAGCGA 2160 
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TAAAAATCCG TTTTTTGAAG TTTTCAAAGT TCCGAAAACC AAAGGCATTG CGCTTGATAA 2220 

GTTTGATGAG ATTATTGGTC GCTTCCAATT TGGCGTTAGA ATAGTGTAGT TGAAGGGCGT 2280 

TGACGATTTT CTCTTTGTCC TTTAGAAAGG TTTTAAAGAC AGTCTGAAAA AGAGGATGAA 2340 

CCTGCTTTAG ATTGTCCTCA ATGAGTCCGA AAAATTTCTC CGGTTCCTTA TTCTGAAAGT 2400 

GAAACAGCAA GAGTTGATAG AGCTGATAGT GATGTTTCAA GTCTTGTGAA TAGCTCAAAA 2460 

GCTTGTTTAA AATCTCTTTA TTGGTTAAAT GCATACGAAA AGTAGGGCGA TAAAAATGTT 2520 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 10993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xij SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TTTTCTCGAT AATAACTTCC ACCTTATTAT TTGGGATACC CTCCTCTTCT TCACCACCAC 60 

GTTCATAGTA GTCATCGCGA TAGAGAAAAG CTACGATATC AGCGTCCTGC TCAATAGACC 120 

CAGATTCACG AATATCAGAC AAGACCGGTC TCTTGTCCTG ACGTTGTTCT ACACCACGAG 180 

AAAGCTGACT CAGAGCGATT ACTGGAACCT TCAATTCCTT GGCTAGTATT TTCAACTGAC 240 

GAGAAATTTC AGAAACTTCT TGTTGACGAT TTTCTCGACC AGTTCCCGTG ATAAGTTGCA 300 

AATAGTCTAT CAAAATCAAA CCAAGATTTC CAGTTTCTTG AGCCAATTTA CGAGAACGAG 360 

AACGAATCTC TGTAATCCGA ATACCTGGCG TATCATCGAT ATAGATACTG GCGTTAGcTA 420 

GATTACCCTG AGCAATAGTA TATTTTTGCC ACTCCTCATC TGTCAATTGC CCTGTACGGA 480 

TAGAATGTGA CTCCACTAAG CCTTCTGCAG CTAACATACG ATCTACCAAG CTTTCCGCAC 540 

CCATTTCGAG TGAAAAAATA GCAACCGTTT TGTCCAACTT AGTCCCAATG TTCTGAGCGA 600 

TATTCAAGGC AAATGCTGTC TTACCAACTG CTGGACGAGC TGCTAAGATA ATCAACTCCT 660 

CCTCATGAAG TCCTGTTGTC ATATGATCCA AATCACGATA ACCTGTCGCA ATACCTGTAA 720 

TATCGGTCGT TTGTTGCGAG CGAGCTTCCA GATTTCCAAA GTTGAGATTC AACACATCTC 780 

GAATGTTCTT AAACCCGCTT CGATTTGCAT TTTCACTGAC ATCAATCAAC CCTTTTTCTG 840 

CCTGAGCAAT AATTTCATCA GCTGGTTGTG ACGCTTCGTA AGCTTGGTTG ACAGACTCTG 900 

TCAACTTGGC AATTAAACGA CGTAGCATTG CTTTTTCTGC AACAATCTTA GCATAATACT 960 

CCGCATTAGC AGAAGTTGGC ACAGAATTAA CAATCTCAAC CAAGTAAGAC AAGCCACCAA 1020 
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TATTCTGTAA 


ATCACCTTGA 


TTATCAAGGA TAGTACGAAC 


CGTTGTTGCA 


TCTATGGCAT 


1080 


CACCACGATC 


GGATAAATCG 


ACCATGGCTT 


GGAAAATCAA 


ACGATGGGCA 


TACTTAAAAA 


1140 


AGTCCCGAGA 


CTCAATGTAT 


TCTCGCACAA AAACAAGTTT 


ACTCTCATCA 


ATAAAGATAG 


1200 


CCCCTAAAAC 


GGATTGCTCA 


GCTAAGATAT 


CTTGAGGTTG 


TACTCGTAAC 


TCTTCTACTT 


1260 


CTGCCATCAG 


ACTTCCCTTC 


CTTTTACAAT 


CTTGTCAAGA 


AGGTGTAAAC 


TTATCCTTCT 


1320 


TTCACACGAA 


GATTGATTAC 


ACTTGTGATA 


TCTTGATAGA 


TTTTCACTGG 


CACATCAATC 


1380 


AAACCAACCG 


CTCGAATCGG 


AGCTTGTACT 


TGAATATGAC 


GTTTATCAAT 


CTTAATTCCA 


1440 


AATTGCTTTT 


GCAATTCTTC 


TGCAATCTTC 


TTATTGGTAA 


TAGAACCAAA 


GGTACGACCA 


1500 


TCTGGACCAA 


CTTTTTCAAC 


AAATTCTACA 


ACAGTTTCTT 


CTGCTTCAAG 


TTGTGCTTTA 


1560 


ATTGCTTTTC 


CTTCTGCAAT 


CATCTCAGCG 


TGAGCTTTTT 


CTTCCGATTT 


TTGTTTACCA 


1620 


CGAAGTTCAC 


CTACAGCTTG 


AGCAGTCGCT 


TCTTTGGCTA 


GATTCTTTTT 


GATAAGAAAG 


1660 


TTTTGCGCAT 


ACCCTGTTGG 


TACTTCCTTA 


ATTTCGCCTT 


TTTTACCTTT 


TCCTTTAACA 


1740 


TCTGCTAAAA 


AGATTACTTT 


CATTCTTCTT 


TCTCCTTTTC 


CTTCATTTCA 


TTTAATACAA 


1800 


TTTCTGTCAG 


TTTTTCACCT 


GCTTCTGACA 


AGGTTACATC 


TTTAATTTGA 


GCTGCTGCCA 


1860 


AATTAAAGTG 


GCCTCCACCG 


CCTAACTCTT 


CCATAATCCG 


TTGTACATTC 


AGTTTACTAC 


1920 


GACTTCGAGC 


TGAGATAGAG 


ATAAATCCTT 


GTGTATTCTT 


CGCAAGAACA 


AAACTCGCTT 


I960 


CAATACCTGA 


CATGGCTAAC 


ATGGCATCTG 


CTGCCTTACT 


AATAACAACT 


GTATCATAGC 


2040 


ATTTCATGTC 


CTTAGCCTCT 


GCTATTAGTA 


CATCTGAACC 


TAATTTACGC 


CCCTGTAAAA 


2100 


TAAGTTCATT 


GACCTCACGA 


TATTCTTCAA 


AATCTGTCGC 


AGCGATTTCC 


TGGATAGCAA 


2160 


TACTATCACT 


TCCGCGCGTT 


CTGAGATAGC 


TAGCAACATC 


AAATGTCCGA 


CTAGTTACTC 


2220 


GCGAGGTGAA 


ATTTTTAGTA 


TCCAACATCA 


TACCAGCCAT 


CAAGACACTT 


GCTTGCATAC 


2280 


GACTCAAACG 


ATTTTTCTTA 


GAATTCTGGA 


ACTGAATCAA 


TTCCGTTACC 


AACTCACTGG 


2340 


CACTACTTGC 


ACCACTTTCG 


ATATAAGTAA 


TAACCGCATT 


ATCTGGAAAA 


TCCTGATCCC 


2400 


TTCTATGGTG 


GTCAATAACA ATGGTTTGGG 


TAAATAAATC 


ATAAAATTCT 


TTTGATAATG 


2460 


TTAAGGCTGT 


CTTTGAATGG 


TCTACAAGAA 


TCAACAAAGA 


ACGATTGGTC 


ACCATCCCCA 


2520 


TTGCATCCTT 


AACAGACAAC 


AACTTCGTAA 


CTCCTTCTTT 


TTCTATGAAT 


GAAACAGCTC 


2580 


GTTCAATATC 


TGGAGACATT 


TGTTCTTCAT 


CATAAAGAGC 


ATAGCTATTT 


TCAATCACAT 


2640 


TGCTGGCGAA 


CAACTGCATA 


CCTACAGCAG 


AGCCCAAAGC 


ATCCATGTCT 


AAATTTTTGT 


2700 


GACCGACTAC 


AAAAACCTGA 


TCTACACTCC 


GAATCTTATC 


TGAAATAGCT 


GTCATCATAG 


2760 


CGCGCGTACG 


AGTCCGTGTA 


CGCTTGATTG 


AAGCAGCAGA 


CCCACCACCA 


AAATAAACTG 


2820 
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GATTTTTCGT TTCGTCGTTT TCCTTAACAA CCACCTGGTC GCCACCACGT ACTTCAGCCA 2880 

AGTTCAAATT GAGGAAAGCA ACTTTCCCTA TCTCATCATG ATTTCCATCG CCATAAGAAA 2940 

ATCCCATACT TAAGGTCAAG GGCAACTGTC TCTGTTTCGA CTCTTCTCTG AAAGCATCAA 3000 

TAACAGAAAA TTTATCATTC ATCAAGCCCT CAAGCACCGT GTAGTCAGTA AATAGATAAA 3060 

ATCGATCCAT ACTTACCCGA CGAGAAAACA TCATGTGTTT TTCTGAAAAC TCTGATATAA 3120 

AATTAGCTAC AAAACTATTG ATTTGACTAA TATCTGACTC AGAAGTTTCA TCCTCCAAAT 3180 

CATCATAATT ATCCACAGAG ACAATCCCAA TCACTGGTCT ACTTGTTACC AATTCATCTG 3240 

TTATGGCTTG TTCCCTGGAT ACATCTACAA AATACAAAAC ACCGGAAGAA GCATCCATAT 3300 

GAACAGCATA ACGCTTCTCA CCAAGCTTGG CATAAGTAGA CGGATTTCCT ACTGAAGCCT 3360 

TGATAATCGT TTGAACAGCT TCTAAATCAA AATCACCATC TTCCTTGGTC AAAATCAATT 3420 

CAGCATAGGG ATTAAACCAC TCAACCTCTC CAGAAGATAA ATTCAATTTC ATAACACCTA 3480 

CAGGCATCTG TTCCAATAGA GCTGTCAAAC TTTCTTCCGC TTGGTGGTTT ACATACTGTA 3540 

TCTGTTCTAC ATCACTCCTT GTATAATGCA CTCTCAGTTT CTTAAATAAA AAAACATAGC 3600 

CTCCTACAAA AAGAAACAAA ATTAAAACCG TCAACAGATT ATTATTAACA AAAATAATGA 3660 

AAGTGGATAA GACTCCAAAC GCAATCAATC CTACTAGAAT AGGAAAAATT GGACTTACAT 3720 

AAAATTTTTT CATTCAAAAC CTCTTGGCAC CCATTATACC ATAATACCCC TCAAAAAGCG 3780 

ACTTTTTAAA AGTGTAATCA GTAATTCTAT CAATTATAAG AAAAAGGTAG TTTACAATTC 3840 

AGTAAACCTA CCTTTACACA TATTGAAATT AAGATTCTTT AACCTCTAAC AAACCAATTT 3900 

CGCCATCCTC ACGACGATAA ATCACATTGG TTGTCTGATC TTCAACATCC ACATAGATAA 3960 

AGAAATCATG CCCCAATAAA TCCATTTGTA GAATTGCTTC TTCCAAATCC ATTGGTTTTA 4020 

AATCAATTTG TTTTGAACGA ACAACTTTAG ACTGGACAAT ATTTGAATCT TCCACCAAAG 4080 

CATCTGTAAA TAATTGACCA GTTGCTACCT TATTTTTATT TTTACGCTCG ATTTTTGTTT 4140 

TATTTTTACG AATCTGACGT TCAATTTTAT CAGTTACAAG GTCAATTGAA CCATACATAT 4200 

CTTGAGATAC ATCTTCTGCG CGGAGAGTAA TAGATCCAAG CGGAATCGTT ACTTCCACTT 4260 

TAGCCGTTTT TTCACGATAA ACTTTTAAGT TAATTCGGGC ATCCAACTCT TGTTCTGGTT 4320 

GGAAGTACTT TTCGATCTTT TCGAGTTTAG AAACTACATA ATCACGAATT GCTTCTGTTA 4380 

CTTCTAGGTT TTCACCACGG ATACTATATT TAATCATATG AGTACCTTCT TTCTAAACAT 4440 

TTTTGTTTTT ATGATTTTAT TATAACGCTT TCATTCTATT TTTGCAAATT TTTTCCTCAT 4500 

CTTACAAGGG AAAATGTTTT TACATCCTTA GCACCAGCTT CTTCCAACAG TTTCTTAACA 4560 
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CGATTTATAG TTGCTCCTGT AGTATAGATA TCATCTATAA GTAGGATTTT TTTAGGAATA 4620 

GTGACTCCAC TTTTAATAAA GAAAGGAAGT TCTGTCCCCA AGCGCTCTGA ACGATTTTTA 4680 

GAAGAACTGG CTCTCTCTTC TCTTTTCTCT AATAAATCCA GATACTCAAA GCCTGCTGCC 4740 

TCTACCAAGC CCTCAACCTG ATTAAATCCT CTATTAGCAT ATCTATCAGG ACTTAGGGGA 4800 

ATTACAACAA ATTGATACTC TTTGTACTTT TTCAACTCCT CACTTAAAAA TGAAGCGAAA 4860 

ACTTTTCTTA ACAGGAAGTC TCCATCAAAC TTATACCGAC TGAAAAAATC CTTCATAGCT 4920 

TGATTGTAAG TAAAAATCGC TCTATGACTG ACTTCAACTC CCTCTTTACA CCAAAGTTGA 4980 

CAATCTTGAC ACTTTGTTGA CAACTCTGTT TTCATACAAT TTGGACAGTT CTCTTCCCCA 5040 

ATTCTTTCAA AAGTAGAATC ACAGTCTGAA CAAAGACAAG AGTCATCATT CCTCAGAAGT 5100 

AAGAGACTAC TAAAAGTTAA AACAGTCTTC ATAGTCTGCC CACATAACAA GCACTTCATA 5160 

GACCAGCCTC CTTATTCATC ATCTGAATTT CCTTAATCGC CTTCTTGATT GAAGCATTTA 5220 

ACCCATCATG GAAGAAAAGC AAATCTCCTG TCGGTCTATC CATGCTTCGT CCAACTCGTC 5280 

CACCAATCTG AATCAAACTA GACTTGGTAA ACAAACGATG ATTGGCCTCT ACTACGAAAA 5340 

CATCCACACA AGGGAAGGTA ACTCCGCGCT CCAAGATTGT CGTACTGATA AGTATTGTCA 5400 

GTTCTCCATC TCGAAAAGCT TGTACTTGCT CTAATCGATC CTCTGTTACA GAAGATACAA 5460 

AGCCAATTTT CTCATTTGGA AATTGCTCCT GTAAGATTTC TGCTAACTGC TCCCCTTTCT 5520 

TAATTTCTGA AGCAAAAATG AGTAACGGAT AAGCTGTCTT TCTCTGCTTC TCAATATAGG 5580 

ACTTTAACTT TGGTGACAAA CGATTCTTGT CTAAGTAGCG ATTAAAATCC GATAACCAAA 5640 

TTGGTTTTGG AATAATCAAC GGATTTCCAT GAAACCGTCT CGGTAAATTC AGTCTTTTTA 5700 

GTTCTCCTAA ACGGACCTTT TTATCTAACT CATTGGTCGA AGTCGCTGTT AAAAAGATTC 5760 

TCAATCCATT CTCCTTTACA CTATTCTTGA CAGCGTGGTA AAGCATGGGA TTATCAACAT 5820 

AAGGAAAAGC ATCTACTTCA TCCACTATCA GCAAATCAAA AGCTTGATAA AACTTCAATA 5880 

ACTGATGGGT TGTTGCAACA ACTAGTGGTG TTCGAAAATA AGGTTCCGAT TCTCCATGTA 5940 

GCAAAGCTAT CCCGCAAGAA AAATCCTGTT GCAGGCGCTT GTACAGCTCC AAACAAACAT 6000 

CTATGCGAGG ACTAGCCAAA CACACTGCAC CACCCGCATT GATCACTTTA GCCACTACTT 6060 

GATAAATCAT TTCTGTCTTT CCAGCTCCTG TTACCGCATG AACTAAGGTT GGCTTTTGCT 6120 

TGTCTACTAC TTGAAGCAAT CCCTCTGACA CCTTCTCTTG AAAAGGAGTT AATTGGCCGC 6180 

GCCATTTGAG AACATCTTGC TTTGGAAAAT CCTCCTGCGG AAAATAGTAT AAAGTTTGAT 6240 

CACTTCTGAC TCGCTTCATC AGCAAGCACT CTCGACAATA GTAAGCACCG ATGGGCAAAT 6300 

ACCATTCTTC TAGAATAGTA CTATTACAGC GTTGACAGAA AAGTTTCCCC TTCTCCTTTC 6360 
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TCATTGCTGG AAGTTTCTCC GCCAACTGAC 
ATAAACGACC GAGATAATCT AAATTTACTT 
TTTAGATGAT TTTTTAGTAC AATTAAATCA 
AAGTCCAAGA AGAAATCAAA AAATCTCGCT 
AAGAAGAGGC TCGTGACTTC ATTACTGCCA 
ACTGCTCTGC CTTCATTATT GGAGAACGTA 
AGCCTAGTGG TACTGCTGGT GTTCCCATGC 
ATGTCTGTGT GGTCGTGACA CGCTACTTTG 
TTCGTGCTTA CGCCGGCAGT GTCGCCTTAG 
AAGAACAGGC TGGCATTGCT ATTCAAATGT 
TCCTTAAAGA ACATGGTCTC ATGGAGCTGG 
TGATTTATGT TGATAAAGAA GAAAAAGAAA 
ATGGAAAAGT CACTTTAACT GACCAAGGTT 
TGTAAACAAT GAATAATACA GCGTTTCGTT 
ATAAAAAGAG GCGTACCAAA ATATACTAGA 
CGTTTTCCTT CACACCTATT TACTAGAATT 
CTTTGATCTA TGATATATAG AAATGGTATG 
AAGAGGTATT CATATGTCTA TTTATAACAA 
TGTTAAACTT AACAACATCG TGCCAGAAGG 
ATTTAATCCT GGTTCATCTG TAAAAGACCG 
ACAAGATGGT ATTCTGAAAC CTGGTTCTAC 
TATTGGACTT TCATGGGTAG GTGCTGCTAA 
AACTATGAGT GTAGAACGAC GTAAAATTAT 
TCCTGGTAGC GAGGGAATGA AAGGTGCTAT 
TGATGGTTTC CTTCCTCTTC AATTTGACAA 
AACAGGAGCT GAGATACTAG CTGCTTTCGG 
AGTAGGTACT GGTGGAACGA TTTCTGGTGT 
CATTCAAGTT TTTGCAGTAG AAGCAGATGA 
TCCTCACAAA ATTCAAGGTA TCTCAGCTGG 
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GTTCTTCTTC TGTTAATTCA 
TCATACTTCT TTATTCGTAA 
TGGAATTTAG GACAATTAAA 
TTATCTGCCA TGCCAAGCGT 
TCAAAAAAGA ACACTACAAA 
GTGAAATTAA ACGTACAAGT 
TTGGGGTACT AGAAAATCAC 
GTGGTATTAA ACTAGGCGCT 
CTGTCAAAGA AATTGGTATT 
CTTATGCTCA GTACCAAGAG 
ATACAAACTT TACAGATCAA 
CTATTAAAGC TGCACTTGTG 
TACGAGAGGT TGAAGTTCCT 
GACATTCTCA CAACTACTTT 
AAATGAAGCA ATTCAAACGA 
AGCTGAACGC AATCACTTGA 
GATAGCGTTA TACTAAAGAT 
CATTACTGAA TTAATCGGTC 
TGCTGCAGAC GTCTATATAA 
TATTGCCCTT AGCATGATTG 
TATTGTTGAA GCAACAAGTG 
AGGGTATAAA GTCGTCATCG 
CCAAGCTTAT GGTGCTGAAC 
TGCTAAGGCT CAAGAAATCG 
TCCAGCTAAT CCAGAAGTAC 
TAAAGATGGA TTAGATGCCT 
TTCTCATGCA CTCAAATCAG 
ATCTGCTATT CTATCTGGTG 
ATTTATTCCT GATACACTTG 



TTCTCAGTAA 6420 

AAACTAGCAC 6480 

GAGGACGGTC 6540- 

GTTTATAGCG 6600 

GCGACACATA 6660 

G ATGATGGTG 6720 

AATCTCACCA 6780 

GGAGGACTAA 6840 

ATTGAAATAA 6900 

TACAGTAACT 6960 

GTCGATACGA 7020 

GAGTTTTTT A 7080 

GTAAACTTAG 7140 

AGCGAGCAAA 7200 

AACCTGATAT 7260 

AAATTAATGA 7320 

ATCTTATACA 7380 

AAACACCGAT 7440 

AGCTTGAAGC 7500 

AAAAAGCTGA 7560 

GAAACACCGG 7620 

TTATGCCTGA 7680 

TCGTCCTAAC 7740 
CTGCTGAACG r 7800 

ACGAAAGAAC 7860 

TTGTTGCTGG 7920 

AAAATTCTAA 7980 

AAAAACCTGG 8040 

ATACTAAAGC 8100 
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CTATGATGGT ATCGTTCGTG TAACATCAGA TGACGCTCTT GCACTCGGAC GTGAAATTGG 8160 

TGGAAAAGAA GGCTTCCTTG TAGGGATTTC CTCAGCTGCA GCTATCTACG GAGCCATCGA 8220 

GGTTGCCAAA AAATTAGGTA CAGGTAAAAA AGTCCTTGCC CTAGCACCAG ATAACGGTGA 8280 

ACGTTATCTC TCTACAGCAC TTTATGAATT GTAACCGTCC AATAACGAAG TCTATTGAAA 8340 

AATCTCCAGA CTAGAGAACT CACGGATAGT TCCTAATCTG GAGATTTCTT ATTTGCACTT 8400 

TTCTTGTACA ACTTTAGTCC ATGGTAAATA GGCCTCTAAA ACCTCTTTGT TTACGAGAGT 8460 

TTCCACGTTT GGAAGACATT CTAGAAGATA GGATAGATAT TTCTCACTAT TTATAATGGA 8520 

TTGAAATAAG ATATGAACAA ATCGATTAGA ACATGATGGT AAAGCGTAAT CCCTTGTTTC 8580 

TCAGCTTTCC CAGACAAAAA AGTCCAATAG TAAGTCAGCT GACTATCACT CTCTAGCACC 8640 

CTATAAGAAG TTTCATCCGC ATGAAGTAAG GGCTGAGTCA ATAGTCTCTC TCGCAAGAGG 8700 

TTATAAAGGG GCTCCAAATA GTATTGACTC GTCTTGATAT GCCAATTAGA GATTTCCTTA 8760 

CGTGTGATTG GTAAACCCAT CCTAGCCCAA TCTTCTTCTT GGCGATAATT GGGTACCTTC 8820 

AGATTAAACT TCTGATGGAT GGTGTGAGCG ATAATAGAAG CTGAGCCAAA GTTATGCGCT 8880 

AAAGGGGCTT TAGGAATAGG AGCTTTCACA AGCTTATCCA GATGATTATC TTTTACTCGT 8940 

TATGGACAAT GCTATATGGC ATAAATCAAG TACCTTAAAG ATTCCGACTA ATATTGGCTT 9000 

TGCATTTATT CCTCCATACA CACCAGAGAT GAACCCCATT GAACAAGTGT GGAAAGAGAT 9060 

TCGTAAACGT GGATTTAAGA ATAAAGCCTT TCGAACTTTG GAAGATGTCA TACAAGGACT 9120 

GGAGAAGGAG GTGATAAAGT CCATCGTTAA TCGGAGACGG ACTAGAATGC TTTTTGAAAA 9180 

CAGATGAGTA TAAAAAGAAA GTCCTCATTT CAATAGAAAT CACGACTTTC TGATGAATTT 9240 

ATAGTAAAAT GAAATAAGAA CAGGATAGTC AAATCGATTT CTAACAATGT TTTAGAAGCA 9300 

GAGGTGTACT ATTCTAGTTT AAATCCACTA TATTTGGGGA GTGATAGAAA AGCCCTTCAT 9360 

CAGCCAATCT ACTTGTTCAG GTGCGAGAGC TTTGACATCC TTTTCTGTAC TGGACCAAGT 9420 

CAGTTTTCCG TTCTCAAAGC GTTTATATAA TATCCAAAAT CCTTGACCAT CCCAGTAAAG 9480 

AACTTTAAAG CGGTCTTTAC GTCCACCACA AAAGAGAAAG ACTTGATCGG AGAAAGGATC 9540 

CAATTCAAAG TGGGTTTTAA CTACATAGGC TAATGAGTCT ATTCCCTGCC TCATATCTGT 9600 

CTTGCCACAA ACAAGGTGAA CTTGACCTAA ATCACTTAGT TGAATTATCA TAGTACAATA 9660 

CCTTTCCTCC GATAATTATT TTTTATCTGG TATACTGGAA GTTGGGGAAT TAGGATAGAT 9720 

ACCTTGTTAT GACGCGCTTA CTATGAATTT GAAGTATAGT CTCCTAAATG CACTTAGCCC 9780 

TTATTATAGG GCTTTTTGTT TTAATTATTC TAATCGAGTG AGACTGGGGA AAAAACAATT 9840 

TCAGGAAAAA TCTAAGCCCT ATACAAAAAA GGAAGCAATT TGCTTCCTTT CTATTATTAG 9900 
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TTATTCAAGG CTGCTGCCAT TGTAGCTGCA ACTTCAGCTT CGAAGTCGTT TGCAGCTTTC 9960 

TCGATACCTT CACCAACTTC AAAGCGAGCA AACTCAACTA CCGAAGCGTT AACTGATTCA 10020 

AGGTATGCTT CAACTGTCTT GCTGTCATCC ATGATGTAAA CTTGTGCAAG AAGTGTGTAA 10080 

GCTTGGTCAA CTTTAGTGTT ATCAAGCATG AAGCGATCCA TTTTACCTGG AATAATTTTG 10140 

TCCCAGATTT TTTCTGGTTT GCCTTCTGCA GCCAATTCAG CTTTGATGTC AGCTTCAGCT 10200 

TGAGCAATAA CATCATCAGT TAATTGAGCT TTTGATCCAT ACTTCAAGTG TGGAAGAGCT 10260 

GGTTTATTAA CCATTGCACG GCTTTCGTTG TCTTGGTCGA TAACGTGATT CAATTGTGCC 10320 

AACTCATCTT TAACGAATTG CTCATCCAAT TCTTTGTAAG AAAGAACTGT TGGTTTCATC 10380 

GCTGCGATGT GCATTGACAA TTGTTTAGCA AGTGCTTCGT CTCCACCTTC AACAACTGAA 10440 

ATAACACCGA TACGTCCACC GTTATGTTGG TATGCTCCAA AGTGTTGTGC GTCTGTTTTT 10500 

TCAATCAATG CAAAGCGACG GAATGAGATT TTCTCTCCGA TAGTTGCTGT TGCAGATACG 10560 

TATGCAGCTT CAAGAGTTTC ACCTGAAGGC ATTATCAAAG CAAGAGCTTC TTCGTTGTTA 10620 

GCAGGTTTTC CTTCAGCAAT GACTTTAGCT GTAGTATTTA CCAATTCAAC GAATTGAGCG 10680 

TTTTTTGCAA CGAAGTCAGT TTCAGCGTTT ACTTCAATAA CTGCTGCAAC ATTACCGTTA 10740 

ACATAAACAC CAGTCAAACC TTCTGCAGCA ACACGGTCAG CTTTCTTAGC TGCCTTAGCC 10800 

ATACCTTTTT CACGAAGCAA TTCAATCGCT TTTTCGATGT CACCGTCTGT TTCTACAAGC 10860 

GCTTTTTTAG CGTCCATAAC ACGGGCACCA GATTTTTCAC GCAACTCTTT TACAAGTTTA 10920 

GCTGTAATTT CTGCCATTTT AATTCTCCTA TATTTTTTGA AAATAGGAGA GCGCGGCTAA 10980 

GCCCCGCCTC CGG 10993 
(2) INFORMATION FOR SEQ ID NO: 16: 

ti) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8411 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CGACGGGGAG GTTTGGCACC TCGATGTCGG CTCGTCGCAT CCTGGGGCTG TAGTCGGTCC 60 

CAAGGGTTGG GCTGTTCGCC CATTAAAGCG GCACGCGAGC TGGGTTCAGA ACGTCGTGAG 120 

ACAGTTCGGT CCCTATCCGT CGCGGGCGTA GGAAATTTGA GAGGATCTGC TCCTAGTACG 180 

AGAGGACCAG AGTGGACTTA CCGCTGGTGT ACCAGTTGTC TTGCCAAAGG CATCGCTGGG 240 
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TAGCTATGTA GGGAAGGGAT AAACGCTGAA AGCATCTAAG TGTGAAACCC ACCTCAAGAT 300 

GAGATTTCCC ATGATTATAT ATCAGTAAGA GCCCTGAGAG ATGATCAGGT AGATAGGTTA 360 

GAAGTGGAAG TGTGGCGACA CATGTAGCGG ACTAATACTA ATAGCTCGAG GACTTATCCA 420 

' AAGTAACTGA GAATATGAAA GCGAACGGTT TTCTTAAATT GAATAGATAT TCAATTTTGA 480 

GTAGGTATTA CTCAGAGTTA AGTGACGATA GCCTAGGAGA TACACCTGTA CCCATGCCGA 540 

ACACAGAAGT TAAGCCCTAG AACGCCGGAA GTAGTTGGGG GTTGCCCCCT GTGAGATAGG 600 

GAAGTCGCTT AGCTTTAATC CGCCATAGCT CAGTTGGTAG TAGCGCATGA CTGTTAATCA 660 

TGATGTCGTA GGTTCGAGTC CTACTGGCGG AGTAATtGAT AAAAGGGaAC ACAGCTGTGT 720 

TCCTCTTTTT GTATCAATTT GTATCACCAA GCATTTTCAT AAGGAAGTCT GTTATTTCTT 780 

GAGAACTTTC TTTTTTTCCA TGTGCAATCC AAGTTTGGCA GACACCAAAA AGTGCATGAG 840 

TTAGATAGAT GCTACTATAT TCTAATTCAG TGGTATTTAG ATTCAGTTGC ATAAATCGCT 900 

TTTGTAAATC TGTACTAAGC ATGATATGAA GTTTATTTCG TAAGAAATTT TGGATTTCTT 960 

TAGTCCCATT TTCAGAAAGA AGGGCAGCCA GAAGTGGTTC TGACTCTAGA TATTCAAAAA 1020 

CTTCTAAAAT AGCGTCTCTT TTGTGATGAG CATGTTTTTG AAAAATATAT TCAAATGTAT 1080 

GGAATAGCTT GCTTTGATAG TGCTCAATCA TATCATACTT ATCCTTATAG TGAGTATAGA 1140 

AGCTGGAACG ACTAATTCCG GCTTTTTCTA CTAATTTGAC AGTAGAAATT TTATCAAATG 1200 

GCTGTTCCAT CAGTAATTGT ACCATAGCAT TTTCAATAGT TCGCTTTGTT TTTAAGCGTT 1260 

TGTTACTTTC TTGCATATTT CCTCCTTGTA AACAAATTAG ACTATATGTC TAAAAATAGA 1320 

TTTTTTATCT TGTAATTTAG ATTTTTTAAT GTATAATCTA TTATATCAAA ATTTTAGACA 1380 

ATATGTTTAA AAAAGGAGAA ACTAAGTTTA AAGAATGGAA AGCAATTTAA AAAAAACCAA 1440 

CCTTTATTAT TGTCATGATC GGGATTTCTC TTATTCCAGA TCTGTACAAT ATCATATTTT 1500 

TGTCATCAAT GTGGGATCCA TATGGGCAAT TGTCTGACTT ACCTGTGGCA GTTGTAAATA 1560 

ATGATAAAGA GGCTTCCTAT AATGGTAATA CTATGGCAAT AGGAAAAGAC ATGGTGTCCA 1620 
ATTCAAAAGA AAATAAAACC TTGGATTTTC ATTTTGTAGA _TGAAGAGGAA_GGAAAGAAGG _ _ 1680 

GATTGGAAGA TGGCGATTAC TATATGGTAG TGACTTTACC AAGTGATTTA TCTGAAAAAA 1740 

CAACTACATT ATCCAATATT CAATCGACAG CAGCTTATCA ATCATTGACA AGTGAGCAAC 1800 

AAACTGAGAT AAGTGATTCT GTATCTCAAA ATTCAACTGA TAGTATTCAA TCGGCTCAGT 1860 

CAATTGTAGC TTTAGTACAA GATTTACAGG GAAGTTTAGA AAACTTACAA AATCAATCTT 1920 

CTAATCTTTC GACTTTAAAA AATCAATCTA ATCAAGTATC ACCTATTACT TCTACTTCTT 1980 

TGATAGGATT GTCAAGTGGA TTAACAGAGA TACAAGGAGA TGTTACTAGC AAATTAGTTC 2040 
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CTGCCAGTCA GTCGATTGCA 


TCAGGTGTAA 


ACGCATATAC 


TACAGGTGTT 


GATAAAGTTT 


2100 


CTCAGGGCGC AAGTCAACTA 


AGTGAAAAAA 


ATGCCACCTT 


GACAGGTAGT 


TTGGATAAAC 


2160 


TAGTTTCAGG CTCAAACACC 


TTGACACAAA 


AATCTTCTAG 


ATTGACAGCA 


GGAGTTGGTT 


2220 


AATTACAATC AGGATCTGGG 


CAATTAGCAG 


ACAAATCCAG 


TCAGTTACTT 


TCAGGTGCTT 


2280 


CTCCATTAGA GAATAGAGCT 


AATAAATTGG 


CAGATGGATC 


TGGGAAACTA GCAGAAGGTG 


2340 


GAACAAAGTT AACTTCTGGA 


TTGGAAGATT 


TACAGACAGG 


ACTTGCTTCT 


TTAGGACAAG 


2400 


GACTAGGTAA TGCTAGTGAT 


CAACTCAAAT 


CAGTATCAAC 


AGAATCTAAA 


AATGCAGAGA 


2460 


TTTTGTCAAA TCCACTCAAT 


CTTTCAAAAA 


CAGACAATGA 


TC AAGTTCCT 


GTAAATGGAA 


2520 


TCGCAATAGC TCCTTATATG 


ATATCAGTTG 


CTCTTTTTTT 


GCAGCAATAT 


CAACAAATAT 


2580 


GATATTTGCG AAATTGCCTT 


CAGGACGTCA 


TCCAGAGAGC 


CGTTGGGCTT 


GGTTGAAATC 


2640 


TTGAGCTGAA ATAAATGGTA 


TTATAGCTGT 


TTTGGCAGGA 


ATTTTGGTAT 


ATGGAGGAGT 


2700 


TCAGCTTATT GGTTTAACTG 


CTAATCATGA 


GATGAGAATA 


TTTATTCTCA 


TCATCCTAAC 


2760 


AAGTTTAGTA TTCATGTCTA 


TGGTGACCAC 


TTTAGCAACG 


TGGAATAGCC 


GTATAGGAGC 


2820 


TTTTTTCTCA CTTATTTTGC 


TTTTACTACA 


GTTAGCATCA 


AGTGCAGGTA 


CTTATCCACT 


.2880 


TGCTTTGACA AATGATTTCT TTAGATCTAT TAATCCCTGG TTACCAATGA 


GCTATTCAGT 


2940 


TTCGGGATTA CGACAAACAA 


TCTCTATCAA 


CAAGTCATTT 


TCCTAGCTGT 


CATACTAGTT 


3000 


CTATTTACTA GTTTAGGTAT 


GCTAGCCTAT 


CAACATAAGA 


AAATGGAAGA 


AGATTAAAAA 


3060 


AATCGACCGA TTAACTGGTC 


GATTTTTTAT 


GCCTTAGATG 


ACTTTCGTCT 


GTGATTATAG 


3120 


ATTCCAAATA GTAAGAGAGA 


AGTAAAGGAA 


CAGATTGCTC 


CAGTAATAAA 


ACCATTGGGA 


3180 


ATGAAGGAAA GTGTAATAGT TCCTTTCCCC 


TTGGGAATGT 


CAACTTTCAT 


AAATCCAGTT 


3240 


TGAGCTTGTT TAATTTCTAT 


TTTCTTACCA 


TCTTGGTAGG 


CAGACCAACC 


TTTGTCATAA 


3300 


GGAATGGTGA AGAAAATAGA TGTATCTTGT 


TGGACATCAT 


ATGTAGCAAA 


AACCTTGTTT 


3360 


TTAGAAGTTG ATACTGTGAC 


AGGTTGTTCT 


TTAATTTTTT 


GAATTGCCTC 


GGTGAAAGTT 


3420 


TTGGTATCTA AACGATAGAA 


GGTAGGAGAT 


TCAAATGATA 


CTTGTGAATT 


TCCAGGGAAA 


3480 


CTAACATTGA TATTGAAAGT 


TTTTTTCTCT 


TTAGTATATC 


CTAGATTAAA 


GAAGGAGAAG 


3540 


ACATTATCAG TTGTAAAAGT 


CTTTTTTTCA 


CCATTTACAA 


GGATGTCAAC 


CTTCTTTTGT 


3600 


TTATCGTTAG AAAAGTGAAG 


GTTTATGAAA 


GAGAGATAAA 


CTTGGCTGTT 


TTCTGGAACT 


3660 


TCAATTTGAT ACTGGATTGC 


TGCATCTTCA 


TTTGAAGAAC 


TTGTGACACT 


AATCAAATCA 


3720 


TTAGTATTTT CTATTTTTTC 


TGTTTTTTCA 


TAAGGTATTG 


GAGAAAAATA ATCAAAATTG 


3780 
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ACGTTAGCAA GTTGATTTAA AAATGAGGCC TGATTATCCA AGGTATGTTC ATTGAACTTG 3840 

ACATCATTGT AAACAGATTG ACTCGCAACT GCAATCGGAA GAGAGTATTG ATTTTCATAT 3900 

AGGGTAAGAT TATCTTTTTG ATAGATATCT TTAAAGCCAT ACTTATCAAT AGGACTGTCT 3960 

GAGATATTGT ACTGGATACC AAATAAACTA TCAGCCAAAA TACTATTATT TGCATATCGG 4020 

AGATTGAGAT TAGTCCCAGA GGATTTAAAA CCAAGTTTAT CTAAAGTAGA GCTTGATGAA 4080 

CGATTTCGAA CAGATGAAAA TTGAGAGATT CCATTGTAGT TGAATTTCAT ACTGTCATTT 4140 

CCTGTCTGAG TTTGTAGTTT TTCAGTACGA GTAAATTGAT TTCCAATATA TGTTGAGAAA 4200 

GATTCCATAG CTGGGATATC TCGACTATAA GCACTTCGAG AAGCAAATCC CCATTCCTTA 4260 

GCAATTCCGT CCATTTGAGA TGAAGCATTT AAACTCATTT CAACCAGTAT AAATAAAGAG 4320 

ATTAGAATGG CAAATAGATT CACAGATATA AACTTTTTGA TAACTGCAAG GAGTAAAAGA 4380 

GAATAGACAA CCAAAAATTC AAGAGTAAGC AGAATATTCA AATCTGTTAA AAAAGAATAA 4440 

TGCGATTTTA GATAGATGGT AGCTAAAAAT CCTGCTACTA CAAGAAAAAG CGAAACTAAA 4500 

AAATTCCAGA CTTTAAGTTC TTTCAGACGC TTTAAGACTT CTGCTGCTGT GTAAATTAAC 4560 

AAGGTAGAGA AAATCCAAGC ATAGCGATGT AAAAACATGT TTGGAGTATG CATGCCTTGC 4620 

CAAAATAAGT CAAGAGCTTC TATGTAAAAG CTTGCAATTA GAAATGCAAA GAATATTACA 4680 

TATATGAGTT TCACGTGAAA CTTAATAGAT TTCAGCGTAA AAAATAAAAT GGTCAAAATA 4740 

AAGGGAAATA GTCCAACAAA AATCATTGGG ATGGCCCCAT ACTTTGTTGT GTCAAAGGAA 4800 

CCAATGAATT GCTTAGCAAA GAGATCAAGA TACCAGCTAC TTTCAGTTTG AAACTTTGTA 4860 

ACTTCAGTCA ATTTTTCCCC ATGTGTCTGT AAATCAAATA GAGTGGGAAG AGTCATAATC 4920 

AAACTAGCCA TACCAGCTAA AAAGGAGATA ACTATGAAAT CAAGAACAGA TGATTTTCGA 4 980 

GTCTTAAAGT CCCACGAAAT TTGACAGAGA TACCAGAAAA TAAGAAACAA TACTGTCATA 5040 

TATCCAAAAT AATAATTTTG AATAAATAAG ATTGACAGAC TTGTAAAGTA CAATAGGAGT 5100 

TTCTTTTCAG TTATCAGTAG ATGTAAACCA GTTATAATTA AAGGAATCAA GATAAAAACA 5160 

TCTAGCCAGG TTTTTATCTC TAATTGACTG ACAGTGAAAC TCATCAGAGC ATAGGAAGTA 5220 

GATAAGGCTA GTTTTAAAAT CTGAGGGATA GATTGAAACA ATTTATTCAA ACTAAAAAAG 5280 

GTTGACAGAC CAATCAATCC AAATTTTAAG AGAGTTGTCA GATAGATAGC ATCTGGCATA 5340 

TTCGTTAGAT CAAAAAAGTA AACCAGAGGC GCGAGAAAAC TACCCAAGTA ATAACTAGAT 5400 

AGGGCATAGA AGTTTAGCCC TAGACCACTT GTAAAGGTGT AAAACAGATT ACTATTTCCA 5460 

TGTAGGATAT TTCGTAAGGC TACATCAAAA ATAACGTATT GATGAAAGCC ATCTCCTAAT 5520 

AGAGGAGAGT TGTCGCTATT CCAGTAGATA CTTTGAGATA GATATACTCC AGACATAATC 5580 
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ACTACAGGAA TGATGAAAGA AATAAAATAG GTTCGATATG TTTTTAAAAA TGATTTCATG 5640 

TTACCTCGTA GAATGATAGA AAACTCAGTT GGTTAACCCA ACTGAGTTTT GAAGTTTTAT 5700 

TTAGTCTTTC CAAAGTTCTT TAACTTTTGC TTGTACTTCT GCATTTTCTA GGAATTCATC 5760 

GTAGGTTTCA TCGATACGGT CAATGACGCC ATTTTTAGAT AAGACAATGA TATGGTTAGC 5820 

CAAAGTTTGA ATAAATTCGT GGTCATGGCT GGCAAAGATG ATTGATTCTT TAAAGTTTTT 5880 

CAATCCATCA TTCAAGCTTG AGATAGATTC CAAGTCCAAG TGATTTGTTG GATCATCAAG 5940 

TACAAGGACA TTTGATTTTA AGAGCATGAG TTTTGAAAGC ATGACACGAA CTTTTTCTCC 6000 

CCCTGACAAG ACATTTACAG GTTTGTTAAC TTCATCTCCA GAGAAGAGCA TACGGCCGAG 6060 

GAAGCCACGT AGGAAAGTAT TGTCATCTTC TTCTTTACTT GCGAATTGAC GCAACCAGTC 6120 

AAGAATTGAT TCTCCTCCTG CAAAATCAGC TGAGTTATCT TTTGGTAGGT AAGATTGACT 6180 

AGTTGTAACT CCCCACTTGA CAGTTCCTTC ATAGTCAATA TCTCCCATGA TTGCACGAAT 6240 

TAATGCAGTC GTTTGAATAT CATTTTGTCC AATAAGTGCT GTCTTATCAT CTGGACGCAA 6300 

GATGAAACTA ATATTATCCA AGATAGTTTC ACCATCAATC TTTACAGTTA AATTTTCTAC 6360 

TGTCAAGAGA TCATTACCAA TCTCACGTTC CGCTTTAAAG TTGATAAATG GATATTTACG 6420 

ACTAGATGGC ACAATCTCTT CTAGCTCAAT CTTATCAAGC ATTCTCTTAC GTGATGTTGC 6480 

CTGCCTTGAC TTAGAAGCAT TGGCAGAGAA ACGAGCAACA AATTCTTGCA ATTGTTTAAT 6540 

TTTTTCTTCT GCTTTAGCAT TACGGTCTGC TAGCAATTTA GCAGCAAGCT CAGAAGATTC 6600 

CTTCCAGAAG TCGTAGTTTC CGACATAGAG TTTGATTTTT CCAAAGTCAA GGTCGGCCAT 6660 

GTGAGTACAA ACTTTGTTTA AGAAGTGACG GTCGTGGGAT ACTACGATAA CTGTGTTATC 6720 

AAAGTCAATC AAGAAGTCTT CTAACCAAGT AATCGATTGG ATATCCAAAC CGTTAGTAGG 6780 

CTCGTCCAAG AGAAGAACAT CTGGTTTACC AAAAAGTGCT TTGGCGAGGA GAACCTTTAC 6840 

TTTTTCACCG TTGGCCAATT CGCTCATGTT TTGGTAGTGT AATTCTTCTG GAATGTTTAG 6900 

GTTTTGAAGT AGTTGAGAGG CTTCACTCTC TGCTTCCCAA CCTCCAAGTT CGGCAAACTC 6960 

TCCTTCGAGT TCGGCAGCAC GAACCCCGTC CTCGTCTGAG AAATCTTCCT TCATGTAGAT 7020 

AGCATCTTTC TCTTTCATGA TGCTATAAAG TTTTTCATTT CCCATGATAA CGACATCAAT 7080 

GGCACGTTCA TCTTCGTAGT CAAAGTGATT TTGACGAAGA ACAGAGAGAC GTTCATCTGG 7140 

ACCAAGAGAG ATGTGACCAG TAGTAGGTTC GATATCTCCA GCTAAAATTT TTAAAAAGGT 7200 

TGATTTTCCG GCACCATTAG CACCGATTAA TCCGTAAGTA TTTCCTTCTG TAAATTTGAT 7260 

ATTGACATCA TCAAAAAGTT TGCGATCACT AAAACGTAGT GAAACATCAG ATACTGTAAG 7320 
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CAATGTTTTT CTCCTATATG TGTAATATAT TTATTCTACT AGAAAATACA GAAATATTCA 7380 

AATTTTTATT TGTCAATTTT GTGTAAATTA TATTTACAGT ATCCTTTACA CAAATCTGTA 7440 

AAAAGCAAGG CTGATTTATT TTGATAAATT ACGGTTATTT CATTAAAAAA ATGCTATAAT 7500 

TGAAAGGACT ATATCGAAGG AGAACAAAAT GACTAAAGCC ATTATTTTAA CAGGAGACCG 7560 

TCCAACAGGA AAATTGCATA TTGGACATTA TGTTGGAAGT CTCAAAAATC GAGTATTATT 7620 

ACAGGAAGAG GATAAGTATG ATATGTTTGT GTTCTTGGCT GACCAACAAG CCTTGACAGA 7680 

TCATGCCAAA GATCCTCAAA CCATTGTAGA GTCTATCGGA AATGTGGCTT TGGATTATCT 7740 

TGCAGTTGGA TTGGATCCAA ATAAGTCAAC TATTTTTATT CAAAGCCAGA TTCCAGAGTT 7800 

GGCTGAGTTG TCTATGTATT ATATGAATCT AGTTTCGTTA GCACGTTTGG AGCGAAATCC 7860 

AACAGTCAAG ACAGAGATTT CTCAGAAAGG ATTTGGAGAA AGCATTCCGA CAGGATTCTT 7920 

GGTCTATCCA ATCGCTCAAG CAGCTGATAT CACAGCTTTC AAGGCTAATT ATGTTCCTGT 7980 

TGGGACAGAT CAGAAACCAA TGATTGAGCA AACTCGTGAA ATTGTTCGTT CTTTTAACAA 8040 

TGCATATAAC TGTGATGTCT TGGTAGAGCC GGAAGGTATT TATCCAGAAA ATGAGAGAGC 8100 

AGGGCGTTTG CCTGGTTTAG ATGGAAATGC TAAAATGTCT AAATCACTAA ATAATGGTAT 8160 

TTATTTAGCT GATGATGCGG ATACTTTGCG TAAAAAAGTA ATGAGTATGT ATACAGATCC 8220 

AGATCATATC CGCGTTGAGG ATCCAGGTAA GATTGAGGGA AATATGGTTT TCCATTATCT 8280 

AGATGTTTTT GGTCGTCCAG AAGATGCTCA AGAAATTGCT GATATGAAAG AACGTTATCA 8340 

ACGAGGTGGT CTTGGTGATG TGAAGACCAA GCGTTATCTA CTTGAAATAT TAGAACGTGA 8400 

ACTGGGTCCG G 8411 
<2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9064 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TGCCGTACTC AAGTACAGCC TGCGCTAAGT TTCCTAGTTT GCTCTTTGAT TTTCATTGAG 60 

TATTAGTAAC CAAAATCCGA CCACATAGCC AGCCCCTATG AATATAGCCA TTAAAGCTAG 120 

CATGGAATTT AGGAAATTAA AAACCACCGC AGATACAAAG GTTAGCACAA AAACATTAAA 180 

AGCAATGGTG TCAGAAGCCA AGACTAGAAT ATAGGGTGTC AACCGATCTA AAGTTTTGGA 240 

ATCTAGGAAA AATAAGTGTT TATACATGAT GACCTCCTCT ATGGCTGAAA AGCAAGCCTT 300 
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ATATTATTGA 
CCAGCAAAGA 
GAAAAATGAG 
TGCTTAAAGA 
AGAAAGAACA 
AATACAGCCC 
ATCTGGAACA 
ATGCGGAAGA 
AGTAAACTCA 
TAGTTTTGGA 
ACTGCACTTA 
TCACCTTGTC 
ACCTGACTAC 
AATAAAATCA 
AGAATTTGAA 
TCCTTGATTT 
CCACTGTAAA 
AGAATATCCA 
ACCTCCATTC 
GAAAAGGATA 
AAACAAGCAG 
ATACCTCTAT 
TTGGTTCTAG 
AATGTATGTT 
GGTTAAGTCT 
CGAAACTATC 
AATCAAGAGC 
ATATCATTGT 



ACCCCAAGAC CCTATGTAGA 
TCACATGCAC CGCATAGGAT 
TGATTCCAAC TGTTGCAAAG 
GGAGAGCAAA TAAAATAGAA 
AAGCATGTTG CAGTAATCCT 
GGGCTATATA AATACCTAGC 
AACCTTCCGC AGTTGACTGA 
CTAGCACTAA TACTGTCAAA 
GATAACCATG GCCTGTCTTA 
AGATATTTTG AATCCAGAAT 
CGATAAGCGT CAGCTGAGAA 
TTTTGAATAG AAGTTGATAC 
AGGCTCTACT GCTGTAAGAT 
TAGATAATAG ATACATTAAG 
ACCTOGCATC CAAACCAAGA 
ACCATAAGGT TTTTCCAAAA 
TTACCGCCAC CCCTTTATTA 
GAACAAGCCA CCCAATAGAT 
ACACACTACT CAAGAAAATA 
ATTTATTTCA CTAACAATTT 
GAAAGCTACT TTTTATAATA 
AGAATACACC TATATAAGCG 
ACAAACAAAT GACAAACATA 
GACTAACCAA ATCATCATTT 
AGCACTGAAA AGCAAGACAG 
CTAAAAAAAT TATCTACTGA 
TTTTTCTTAT CCATAATTAT 
GATTTTTAAC ATAATGTAGC 
TTTTTAAAAT TTTTCATCCA 



245 

AAAGTGAGCA AAAACGGGAA 
GGATAAATGC TCTTGGTATA 
ACGAAGATAT -CTAACAGACT 
GGAAGAAGCA AATCAAGAGC 
CTATAAATCA ATTCTTCCAT 
TCTGCAAAGT TAGTCCCACT 
ACATGTTTAG CTGTCTGAAC 
ATCGAATACC AAAGCCATTT 
ACAAGAACCA CAATCATGAC 
AAATTGCCTA TCTGAGAAGA 
AGACTAAATA CGAAAAATAA 
TTTTTCATAG AAATCCTCCC 
TAAGAAGACA GTTTGTTTTT 
GCATTAAAGA CAATGAAAAT 
TAAAGTTTGA TTATCAAAAA 
ATAAATTTAA AGCGATTTCG 
GCAAGAAGGA AAACTCCTGC 
ACGATAGAGA TTTGTAAAAA 
ACAAAAAATA ATCTGTATTT 
AATAGAGCCT TCTACTCAAA 
CTTCAAGCCC CACATGAGCA 
ATTAGTTGTT GATAGAATTC 
AAATCTGCCA AGCCGATAAA 
ACTTATATTT AAGAGTATCT 
GCCAATAATA TTTAAAATGA 
CACTACAAGA AATACTATAC 
TTACTCCTTT CCTAACAAAT 
AGCACCCGTT -GCAACTTTGA 
AATCTTGAAT TGTCATCGAA 



GGTCGCTACA 360 

GCGGGTCAAA 420 

AGGCAGGCTT 480 

AAATCGCGAA 540 

CAGTGGAACC 600 

ATAACCAATC 660 

GTTAAAAGAG 720 

TTTTCTTGGA 780 

TCCAATAAAA 840 

AAATTGCCAA 900 

GTAAGAGAAG 960 

TACTATGACC 1020 

TTTAAGGCTA 1080 

ATGTCCATAG 1140 

GATGAGCAAA 1200 

AATATCTACT 1260 

TTCAAACAAA 1320 

TGTCCCTAAA 1380 

CATATTAAAT 1440 

TATCCTGTCA 1500 

GAAGCGTGAT 1560 

TGTTTCTGAA 1620 

CATAAGTTGA 1680 

CTTTTATTTT 1740 

ACAGTAACGG 1800 

ATATTATAGT 1860 

CCAGCTTATC 1920 

CAAGTTTAGT 1980 

ACATCTTGAA 2040 
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TTGTTAAAAA ATTTAAAAAG TAAGCATTAA AAACATACTT TCCTCTTTAT ATTGTATTGA 2100 

TACCAACTTG TTTGTAGACT TTTCATCCTG CTATCACATA TCATTTTGAC AGGCGAAACA 2160 

ATATTAAAGA AACTCCCCTG TAAATTAAGC TAGCAAATAC AGGGGAGAAA TTTATTTTTT 2220 

AGAGAGTACT ATCCGTATCC TTTTTGGAAG ATTTTGAAAA TATTTTTCTA ATTAAGTCAT 2280 

CCATATAAGG ACCAAATATA CCAACTACTA AACCAATAAT AAAACTTTTA AAATCCATAA 2340 

TTACCACCAA CATATTGCTG CATAGGCTAC ACCTCCAAGT ATAGCTCCAC CTGCAGCACC 2400 

AGTTACACCT ATTCCTATAG CAAATGGTCC CAATAGAAAT GTCAAACCGT TGTTGCACAC 2460 

CCATCAATTG CGCCATATGC AACCCCTGCT GCACAACTAA TTTTTCTTCC CCAATCAATA 2S20 

TCTCCACCTT CAACGCAAGC AAGCATTTCA TTATCCATAA CTGCAAATTG TGACATCATT 2580 

TTTGTATCCA TATAGTGTAT CACTTTTCAG TTACGGAACA AGTTTAATAT AAAAATTATC 2640 

AAAAAAACAT AGGCAATAAA GAGAAAAATT AATTTATCAT AGATTAGAAA TAATATGACA 2700 

AAACAATTCA ATGATGTTAA TTCAATAGTC TTTTGTTTTT TATCGGAGAT ACTTATGGAT 27 60 

AGATAAATAA GATAGGTTTG AAAAGCGAAG AGAATAATAA AGAATATAGC CTTCATAAAA 2820 

TTTAGCTTTC ATTTTTATGA TGTAGCGGTA TAGGCTAAAT ATCCACAAAC CACTGCTCCT 2880 

CCAATTCCTC CTATTGCAGC GCCCCATGGT CCTAGAAGTC TCCCATATTT CACTCCACCC 2940 

GCTGCACAAC CTAAAGCAGC AACTACAGCT GCTCCTCCGG AATTACCTCC ATAAACCTCA 3000 

CTCAGCATTG TTTCATTTAT ATTACAATAA GTATTCATAC AAGTCTCCTT TTATTAAAAT 3060 

CCACCCGTTG CCCCTGTTAC TCCTGCCCAA AGATCCACAC CAAATTTAGC TCCTATGTAT 3120 

CCACATGCTC CCATAAATGG TGCTCCAACA CCACTCGCAG CACAAATAGC TGTCCCTAGC 3180 

CCCCAGCCAC CAAAAGCAGC ACCACCACCT TCTAAGACAT TAGTTTGCCA ATTATTCTTG 3240 

CCTCCTTCAA TACTAGATAA CATAGTTATA TCCATTTCAT GAAATTGTTC CATAATTTTT 3300 

GTATCCATGA CAAATACTCT TTTTTATTTT TAATTTTTGT CTTGTTGTAA CTTTGACAAG 3360 

TTTAGTATAT CATCGTTTTT TAAAATTTTT CATCCAGATT TTGAATAGTC ATCGAAACGT 3420 
CTTGAATTGC AAAAATTACA_ TTAGACTTCC TGCAAAACTA GAATCCTAGT TCATGATTGA . . 3480 

TAATACCAGC ACTCAAATTC ATTCGTAATC CGAAGCGTTT ACGATGACTT CGATAGGTTG 3540 

TTGAAAACAT TTTAAACGTT TTTACTTTGG CAAAGATGTT CTCAACCTTG CTTCTCTCCT 3600 

TAGATAGCGC ATGGTTACAG GCTTTATCTT CAACTGTTAG CGGTTTGAGT TTGCTGGATT 3660 

TACGTGAAGT TTGTGCTTGA GGATATATCT TCATGAGCCC TTGATAACCA CTGTCAGCCA 3720 

AGATTTTACC AGCTTGTCCG ATATTTCTGC GACTCATTTT GAACAACTTC ATATCATGAC 3780 

AATAGTTCAC AGTGATATCC AAAGAAACAA TTCTCCCTTG ACTTGTGACA ATCGCTTGAG 3840 
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TCTTCATAGC 


GTGAAATTTC 


TTTTTACCAG 


AATCATTCGC 


TAATTCTTTT 


TTTAGGGCGA 


3900 


TTGATTTTTA 


CTTCCGTCGC 


ATCAATCATT 


ACCGTGTCCT 


CAGAACTGAG 


AGGAGTTCTT 


3960 


GAAATCGTAA 


CACCACTTTG 


AACAAGAGTT 


ACTTCAACCC 


ATTGGCTCCG 


ACGGAGTAAG 


4020 


TTGCTTTCGT 


GAACACCAAA 


ATCAGCCGCA 


ATTTCTTCAT 


AAGTGCGGTA 


TTCTCGCACA 


4080 


TATTGAAGAG 


TGGCCATAAG 


AAGGTCTTCT 


AGGCTTAATT 


TAGGTTTTCG 


TCCACCTTTT 


4140 


GCGTGTTTAA 


GTTGATAAGC 


TGTTTTTAAT 


ACAGCTAGCA 


TCTCTTCAAA 


AGTCGTGCGC 


4200 


TGAACACCAA 


CAAGACGCTT 


AAATCGTGCA 


TCAGTTAGTT 


GTTTACTTGC 


TTCATAATTC 


4260 


ATAGAACTAT 


AGTAAAATGA 


AATAAGAACA 


GGATAAATCG 


ATCAGGACAG 


TCAAATCGAT 


4320 


TTCTAACAAT 


GTTTTAGAAG 


TAGAGGCGTA 


CTATTCTAGT 


TTCAATCTAC 


TATACTATAC 


4380 


CATATTTTGT 


TTCGCAGGGA 


ATCTATTATA 


AAAGGGTAAG 


TATTGCAAAA 


ACACTTACCC 


4440 


TTTTCTTTTA 


TACTTCATTA 


AGCTCTACTT 


TTTATAATAC 


TTCAAGCCCC 


ACATGAGCAG 


4500 


AAGCATGATG 


ATTAAGCAGA 


GAACAGCGCC 


AATATAAGCG 


ATTATTTGTT 


GGTAGGATTC 


4560 


TCCTGCTGTG 


ATACCTCTAT 


ACAAACAAAT 


AATAGACATA 


AAACCTGTCA 


AGCCGATGAA 


4620 


CATAAGTTGA 


TTGGTTCTAG 


GACTAACCAA 


ATCATCATCT 


TCAAACTCTC 


TTATCCTCAT 


46B0 


TTCCCTAGTG 


AGATAAACAG 


TAACCAAAAT 


AGAAGCCAAG 


TTAATAACTA 


CTAAAAGAAA 


4740 


TTGGAAAACT 


ACGGAAAAAT 


TTAAAAACTG 


ACGAGATAGA 


AATAGATAAG 


TAGAAACAAG 


4800 


CAAGGGCAAC 


TGACCTAAGA 


ACAATCTCGC 


AAGGAAGATG 


TTCCGTTTTT 


TAGCAAGAAA 


4860 


AGTTTTCATT 


TCTTTTCTCC 


TTTCTTTTTA 


TTGATAGCAA 


AATAGATCAT 


AACTGCAATC 


4920 


ACATAGGCTA 


TGGTATAAAA 


TAGCTGATAC 


CAAGCACTCT 


CCCTAAGCGG 


ATATAGAAAG 


4980 


ATGGACATGA 


TTAGATACAG 


AACGAAAATA 


ATCAGTATTT 


TTTTCTTCAT 


AAGATTTCCT 


5040 


CCTAAATGTG 


CGATTTATCT 


TAGTTGAGCA AGAACATTTA 


CACTGCTAGT 


ATAGCACTTA 


5100 


TTTTGACCTT 


GGATCACTCA 


AATCATAAAT 


GGTCATCAAA 


ACCTCTTGAA 


TTGTAAAAAT 


5160 


TAAAAAAGCA 


AGCATGAAAA 


ACATACTTTC 


CTCTTTATAT 


TGTATTGATA 


CCAACTTGTT 


5220 


TGTAGACTTT 


TCATCCTGCT 


ATCACATATC 


ATTTTGACAG 


GCGAAACAAT 


ATTAAAGAAA 


5280 


CTCCCCTGTA 


AATTAAGCTA 


GCAAATACAG 


GGGAGAAATT 


TATTTTTTAG 


AGAGTACTAT 


5340 


CCGTATCCTT 


TTTGGAAGAT 


TTTGAAAATA 


TTTTTCTAAT 


TAAGTCATCC 


ATATAAGGAC 


5400 


CAAATATACC 


AACTACTAAA 


CCAATAATAA 


AACTTTTAAA 


ATCCATAATT 


ACCACCAACA 


5460 


TGTTGCTGCA TAGGCTACAC 


CTCCAAGTAT 


AGCTCCACCC 


GCAGCACCAG 


TTGCTGCACC 


5S20 


TTGCCATGTT 


CCTGTTTTAA TGCCTAGTTG AAGACCTCTT 


GCTGCTCCTC 


CTCCAACACC 


5580 
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TGCTTTGGCA AAATCTCCCC AATTGCATCC GCCACCTTCA ACGCAAGCAA GCATTTCAGT 5640 

ATCCATAACA GAAAATTGTG ACATCATTTT TGTATCCATG ACAAATACTC CTTTTTTAAA 5700 

AAACTAAAAT AAATCAGAAT AGAATCCTCA TAATTTTACT ATAAGTCTTA CCAACTTAGT 5760 

CCCAATTTAT CACCAACCAT ACCTCCTAAG CATGTTAATC CACCCCCAAT TGCACCAATG 5820 

TGTGCTCCAA CAAATGCACC AGCAAGTCCA GCTACTCCTA AAGTGGCCAA ACCTGCTCCA 5880 

GTTCCACCAG TTATAATTCC CGTAGTGACT CCTGTAATCA GTGCATTTTG ACAATCAGTG 5940 

GAGCTATACC CCCCTTCAAC TTTCGCAAGC ATTTCAGTAT CCATAACCTC TAACTGTGAC 6000 

AACATTTTTG TATTCATGAT GAATACCTCC TTTTTATTTT CAATTTGTTA CCAAAGTCTT 6060 

AAATTCAATA AACAAATAGA TTTTTTATAG TATCTTTTTG ATTTTCTTAA AAAAGTATAT 6120 

ACGTCTACTA TCTTCTTAAA GGTAGCAGTA CCTATTTTTT AGTCTAAGAT TTCAATAATC 6180 

TTGAGTATCT AAAATATCTT AATTTCGTTA TTCTCCTTGC AATAAAAAGT TTTACTATAC 6240 

TATTTATTAA CTTGCAGAAA GCAAAAAATA TTAGTAAATA ATAGTTTATA GTTAAGTTTT 6300 

TTATTCCTAC CAATCCATCA ACTAAGTAAA GCATCAACGA TTACATAAAC GATTGATAAT 63 60 

ATAATTAAAA TTTTGCTAAC TATCTTATTC TCATCATTCT TAGATAACTT TGATATTTTG 6420 

TAAGTAAGTA AATAAGACAG TAAATTAATA GCGATAATAA TACTATATTT AAGAATCATA 64 80 

ATCTTACAAA GAGGACATAA TTCCTGAACC TACACAAATA AGTGTTGCTG CTCCCCCAGT 6540 

TATCGGACCA GTCGCAGCAG CTAATAGTAC TGCTCCAATA CAACCACCGA TTGCAGATCC 6600 

TAAATTGCCT CTTCCTCCAC TAACTATTTC GAGTTCTTCA TTATCCATAA CAGAAAATTG 6660 

TTCCATCATT TTTGTATTCA TGACAAATAC TCCTTTTTTC TTTTTTTATT TTTGTCTTGT 6720 

TGTAACTTTG ATAAGTTTAG TATATCATCG TTTTTTAAAA TTTTTCATCC AGATCTTGAA 6780 

TTGTCATCGA AACGTCTTGA ATTAGCTTTT TTATTTCAAG CCACCTCTAA ATGTTTAAAA 6840 

AAAATAATTT CTAATCACTT TTTTACCATT CAGGAAGTTT TAATGACTAT TCAAGATTTC 6900 

ATAAAATATG AACTTAGTTT TATGACATAA TAGACCTATC CACTATATGA AAGGAATTGC 6960 

CAATGACTTC TTATAAACGT ACATTTGTTC CTCAAATAGA TGCGAGAGAC TGTGGTGTCG 7020 

CTGCCTTAGC CTCGATTGCT AAATTCTATG GTTCAGATTT TTCTCTAGCT CACTTGAGAG 7080 

AACTTGCAAA GACCAATAAA GAAGGGACGA CTGCTCTTGG CATTGTAAAA GCCGCTGATG 7140 

AAATGGGCTT TGAAACAAGA CCTGTTCAAG CAGATAAAAC GCTGTTTGAC ATGAGTGATG 7200 

TCCCCTATCC ATTTATCGTT CACGTTAACA AAGAAGGAAA ACTCCAACAT TACTATGTTG 7260 

TCTATCAAAC AAAGAAAGAC TATCTGATTA TTGGTGATCC TGACCCTTCT GTAAAAATCA 7320 

CTAAAATGTC AAAAGAACGC TTTTTCTATG AATGGACTGG AGTAGCTATT TTTCTAGCTA 7380 
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CCAAACCCAG CTATCAACCC CATAAAGATA AAAA6AATGG TCTACTAAGC AAGCTTCCTT 7440 

CCTCTGATTT TCAAACAAAA ATCTCTCATT GCTTACATTG TTCTCTCAAG CTTATTGGTC 7500 

ACTATTATCA ATATAGGTGG TTCTTACTAT CTCCAAGGAA TCTTGGATGA ATACATTCCA 7560 

AATCAGATGA AATCAACTTT AGGAATCATC TCAGTTGGTC TGGTTATCAC CTATATCCTC 7620 

CAACAAGTCA TGAGCTTCTC CAGAGATTAT CTCCTAACCG TTCTGAGTCA GAGATTAAGT 7680 

ATTGATGTGA TTTTATCCTA TATTCGCCAT ATTTTTGAAC TTCCCATGTC TTTCTTTGCG 7740 

ACACGTCGTA CAGGAGAAAT CATTTCACGA TTCACAGATG CTAACTCTAT TATAGATGCC 7800 

TTGGCTTCTA CCATTCTTTC TCTTTTTCTG GATGTTTCTA TTCTGATTCT TGTAGGAGGC 7860 

GTCTTACTGG CACAAAACCC TAATCTCTTC CTTCTTTCTC TTATTTCCAT TCCTATATAC 7920 

ATGTTCATCA TCTTTTCTTT TATGAAACCT TTCGAAAAAA TGAACCATGA TGTCATGCAA 7980 

AGTAATTCTA TGGTTAGCTC TGCCATTATC GAAGATATCA ACGGGATTGA AACTATAAAG 8040 

TCGCTCACGA GTGAAGAAAA TCGCTATCAA AATATAGACA GCGAATTTGT AGATTATTTG 8100 

GAAAAATCCT TTAAGCTCAG TAAATATTCT ATTTTACAAA CGAGTTTAAA GCAGGGAACA 8160 

AAATTAGTTC TGAATATCCT TATCCTATGG TTTGGCGCTC AATTAGTCAT GTCAAGTAAA 8220 

ATTTCTATCG GTCAGCTGAT TACCTTTAAC ACACTTTTTT CTTACTTTAC AACTCCTATG 8280 

GAAAATATTA TCAACCTCCA AACCAAACTC CAATCTGCGA AGGTCGCTAA TAACCGTTTG 8340 

AACGAAGTCT ATCTAGTCGA ATCTGAATTT CAAGTTCAAG AAAACCCTGT TCATTCACAT 8400 

TTTTTGATGG GCGATATTGA ATTTGATGAC CTTTCTTATA AGTATGGTTT TGGATGAGAT 8460 

ACCTTAACAG ATATTAATCT CACGATTAAA CAAGGAGATA AGGTTAGCCT AGTTGGAGTT 8520 

AGTGGTTCTG GTAAAACAAC TTTAGCCAAA ATGATTGTCA ATTTCTTTGA ACCCTACAAA 8580 

GGGCATATTT CCATCAATCA TCAGGATATT AAAAACATTG ATAAAAAAGT CTTGCGCCGT 8640 

CATATTAATT ACCTACCCCA ACAAGCCTAT ATCTTTAATG GCTCTATTTT GGAAAACTTA 8700 

ACCTTGGGCG GTAATCATAT GATTAGTCAA GAAGATATTC TAAAAGCTTG TGAAGTAGCT 8760 

GAAATCCGTC AAGACATTGA AAGAATGCCT ATGGGCTATC AAACTCAGCT CTCTGATGGA 8820 

GCTGGTCTAT CAGGAGGACA GAAGCAACGA ATCGCTCTCG CTCGTGCTCT TTTAACTAAA 8880 

TCTCCTGTTT TAATACTAGA TGAAGCTACT AGCGGTCTTG ATGTCTTGAC TGAGAAAAAG 8940 

GTTATAGATA ATCTTATGTC TCTAACTGAT AAAACCATTC TCTTTGTAGC CCATCGTCTC 9000 

AGTATAGCCG AACGAACCAA CCGTGTCATT GTTCTTGACC AGGGGAAAAT CATTGAAGTT 9060 

GGTA 9064 
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.(2) INFORMATION FOR SEQ ID NO: 18: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7780 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

CTCCATTTTT TTGATTTCAT AAATAAACAA CCTCTCTGTT AATTTTGTAT AATTATAACG 60 

ATATCCAAGT TACTTGTCAA GTGTTTTTTA AATTTTTATC TCAAAAATAT TTTTTCGTTC 120 

AAAAAAAGGA GCCATCAGTT GATTTCAAGC TCCCTTTTAT ACAGAATTAA ACTATTTTAT 180 

AGTTCGACAA TCTTACCTGT TTCAAAGTAG ACAACCCATT CACAGATATT TTTAGCATAG 240 

TCACCGATAC GCTCCAAGTA GGAAATAACT TGGAAATAAT CACGACCCGT AACAATGGCT 300 

TCTGGATTTT TCTTAATCTC TTCAGTCGCA AGGTCACGGA TAGTTTCAAA ATAGTGGTTA 360 

ATTTGCTCAT CCATGGAGGC CACCCGGTAT GCGTCGTCAA CAGAACCATT AAGATAAAGA 420 

TCAAGTGCTG CTTCCACAAC GCTTTTAACT TCACGTCCCA TTTTTTTAAT TTCTTCCTCT 480 

ACAGCTGGAA TGCGCTCTTC CCCCTTCATA CGGATGGTTG CCTGGGCAAT GGCTACAGCG 540 

TGATCCCCCA TACGCTCCAC ATCTGATACA GCCTTAAGGA CAGTCAAGAC TGTACGCAAA 600 

TCTTGAGAGA CTGGTTGTTG GAGTGCGATC ATTTCAAATG ATTTCTTTTC CAGTTTCACT 660 

TCGTATTCAT TTACTTCTGC ATCATCTTCG ATGACCTCTT TTGCCAGGTC ACGGTCATGC 720 

GTGACAAAAG CACGTACCGT ACGATTGATT TGTGAGAGCA CTTCTTGTCC CATAGCGTAG 780 

AACTGGTTAT GTAATTTCTC TAAATCTTCT TCAAATTGAG ATCGTAACAT CTTTCATCTC 840 

CTTATCCAAA TTTTCCTGTA ATATAGTCTT CCGTTTCCTT GTGTTGGGGA TCAAGGAACA 900 

TCTGCTTGGT ATCATTAAAT TCAATCAAAT CTCCATCTAG GAAAAATCCT GTCTTATCAG 960 

AGATACGTGA AGCTTGCTGC ATGGAACGGG TTACCAGAAG CATGGTGTAC TTGTCTTTTA 1020 

GACCATACAA GGTTTCCTCA ATTTTACCAG CTGAAATCGG ATCCAAAGCC GAAGTTGGCT 1080 

CATCCAAGAG GATGATTTTA GGACTAGTTG CCAAGACACG GGCCACGCAG ACACGCTGCT 1140 

GTTGACCACC TGACAATCCA ATAGCTGAAT CATATAGACG ATCCTTGACC TCATCCCAGA 1200 

TAGAGGCACC TTGCAAGGCT TTTTCTACGG CTTCATCCAG AACCTGCTTA TCCTTAATTC 1260 

CATTGATACG AAGCCCGTAG ACAACATTCT CATAGATAGT CATAGGGAAA GGATTAGGTT 1320 

GTTGGAAAAC CATTCCGATT TCCTTACGTA ATTCAACCGT ATCTGTACGC GGACTGTAGA 1380 

TGTTGTGACC ATTGTACACC ACGGATCCAG TTGTGGTCAC CTCTGGATTG AGATCTCCCA 1440 
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TGCGGTTGAG AGACTTGAGG AGGGTTGACT TCCCTGATCC AGATGGACCA ATCAAGGCTG 1500 

TAATTTCCTT AGGTTGGAAA GATAGGGAAA CACTATTCAA AGCCTTCTTT TTATTATAAT 1560 

AAACGGACAG GTCTGATACC TGTAAAATCG CATCTGTCAT ACGGTTTCCT TTCTAACCAA 1620 

AGTGACCAGA TACATAGTCA TTGGTGGACT GTAGCTTGGC ATTTTGGAAA ATAGTTGCAG 1680 

TCTTGTCATA CTCAATCAAA TCACCCAAGT AAAAGAAGCC TGTATAGTCA CTTGCACGAG 1740 

CAGCCTGCTG CATATTATGC GTTACAATGA TGATGGTAAA GTTTTTCTTG AGCTCAAACA 1800 

TGGTCTCTTC TAGTTGCATG GTCGCAATCG GATCCAAGGC TGAGGCTGGC TCATCCATTA 1860 

AGAGGATATC TGGCTTAACA GAGATGGCAC GAGCGATACA GAGACGTTGT TGCTGACCAC 1920 

CTGATAAGGT CAAGGCTGAC TTGTGGAGAT CGTCTTTAAC CTGATCCCAG AGGGCAGCCT 1980 

GACGAAGGGA GGTTTCTACG ATTTCATCTA GGACTTGCTT ATCCTTAACT CCAGCACGTT 2040 

CATGCGCAAA GGTAATATTA CGGTAAATTG ACTTAGCAAA TGGATTGGGA CGTTGAAAAA 2100 

CCATTCCAAT GTGTTTACGC ATTTCATAAA CGTTGATTTC TGGACGGTTG ACATCAATTC 2160 

CACGATAGAG AATCTGCCCA GTTACTTTAG CAATATCAAT AGTATCATTC ATGCGATTGA 2220 

GACTGCGTAA GTAGGTAGAT TTCCCCGATC CCGACGGGCC AATCAAAGCT GTAATTTTAT 2280 

TTCTTTCAAA TTGCATATCA ATCCCCTTAA TGGATTCATT TTTACCATAG TAAACATGGA 2340 

CATCCTTAGT AGAAAGGGCT ACTTTTTCTT CAGGAAAGGT AAGGATATGC TTCTCATCCC 2400 

AGTTATATGT TGACATGGCT TCTCCTTTAG GCAGCGGTTA ATTTCTTGTG TAGATAGCTT 2460 

CCGAACTTAC GAGCTCCAAA GTTAAAAATC AGGATAAAGA TCAGGAGCAC AGCGGCAGAA 2520 

CCTGCTGATA CAATGGTTCC ATCTGGAATA GTGCCTTCAC TATTGACTTT CCAGATATGG 2580 

ACAGCCAAGG TTTCTGCTTG ACGGAAGATA GAGATGGGGC TAGTCACACT GAGGATATTC 2640 

CAGTTAGACC AGTCAAGAGC TGGCGCCGAT TGCCCTGCTG TATAGATCAG AGCTGCAGCT 2700 

TCGCCAAAGA TACGACCAGA TGCCAAGACG ACACCCGTTA CAATACCTGG AAGCGCTTCC 2760 

GGAATAACAA CATGAACCAC TGTCTCCCAG CGAGAAATCC CAAGAGCCAG ACCAGCCTCA 2820 

CGTTGGGTAT GGTGAACGTG TTTCAAACTA TCCTCTACAT TACGCGTCAT CTGAGGCAAG 2880 

TTAAAGACTG TCAAGGCCAA GGCACCTGAA ATGATTGAAA ATCCATACTC AAACTGGACT 2940 

ACAAAGATCA AGTAACCAAA GAGACCCACC ACCACTGATG GTAAAGAGGA CAAAATTTCA 3000 

ATACAAGTCC GCACAAAGTT GGTAACAGGA CCTTTTTTAG CATATTCAGC CAAGTAAATC 3060 

CCAGCTCCCA TAGAAAGAGG TACAGAAATA ATCAAGGTAA TGACCAATAG GAAAAAGGAA 3120 

TTGTAAAGCT GAATGCCAAT CCCACCACCT GCTTGAAAAG CAGAAGACCT TCCAGTCAAG 3180 
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AAAGACCAAG AGATATGGGG CAAGCCCCGA ACCAAGATAT AGAGAATCAA GGAAGCCAAG 3240 

ATTGTCACAA TGATGCTAGC AATCGTATAG AGGACAGCTG TTGCAAGTTT ATCTAATTTC 3300 

TTAGCGCGCA TAATTTTTCT TTCCTCTTTC TTTCGTAATC AATTTAATCA CACTGTTAAA 3360 

AACTAAGCTC ATCAAGAGCA GTACCAAGGC CAGTGACCAG AGAACATTAT TATTTACAGT 3420 

TCCCATGACA GTGTTCCCAA TTCCCATAGT TAATATAGAA GTTAAAGTTG CAGCTGGTGT 3480 

GGTCAAGGAA GTTGGGATAA CAGCTGAGTT TCCGACAACC ATCTGGATAG CTAGAGCCTC 3540 

ACCAAAGGCA CGCGCCATCC CAAAGACCAC TGCAGTGAAA ATACCAGAAC GGGCCGCCTT 3600 

CAAGATCACA CGCCAGATAG TCTGCCAGCG AGTGGCTCCC ATAGCGAAAC TGGCTTCACG 3660 

ATAATAACGA GGAACCGCAC GCAAGCTATC CGTTGTCATA AAGGTTACGG TCGGCAAAAT 3720 

CATGACAAAG AGGACGGAAA TCCCTGACAA AATCCCAAAA CCAGTCCCAC CAAAGACACT 3780 

GCGAACAAAG GGAACGACGA CTTGCAAGCC AATAAATCCG TACACTACTG AAGGAATCCC 3840' 

AACCAGGAGT TCAATAGCTG GTTGCAAAAT CTTCGCCCCT TTTGGTGATA CTTCGGTCAT 3900 

AAAAACTGCT GCACCAATAG CAAAGGGTGT TGCGATAAGG GCTGAGAGAA TGGTAACGAT 3960 

AAAGGAACCC AAAATCATAG GAAGGGCACC AAATTCTTTA CTAGAAGGAT TCCAAGTTCC 4020 

TCCCAAAAGA AAGTCAAAGA TATTCACACC ATTGACAAAG AAGGTCGACA AGCCTTTTTG 4080 

CGCTACGAAA ACCAAAATCA TGGCCACAAG GATGACTATC AAAGAAAGAC AGGCAAAGGT 4140 

CAAACCTTTT CCTAATTTCT CCAGACGAGA, ATTCTTTGAT GGAAGCAACA TTTTCTTAGC 4200 

TAATTCTTCT TGATTCATTA TTGTCTCCCT TCCAACACTG TCACAGTTCC GGCAGCATCT 4260 

TTTTCAACCT TCATTTCCTT AATCGGAATA TACTTCAATC CTTTGACAAT CCCTTCTTGG 4320 

GTCTCATCCG AGAGAACAAA ATTGAGAAAT TCTGCAGCCA ACTCATTGGG CTGCCCCAAT 4380 

GTATACATAT GCTCATAAGA CCACAAGGGC CAATTATTGC TACTTATATT TTCTGGACTT 4440 

AAGTCATAGC CATTCAACTT CATGCTTTTG ACCGAATCAT CTATATAGGT AAGAGATAAA 4500 

TAAGAGATAG CTCCTGGACT TTTTGATACG ATTGATTTTA CCGCTCCATT TGAATCCTGC 4560 

TCCTGACTTT GCATGGCAGA CTGACCTTCC ATAATGACAG TATCAAAGGT AGCACGAGAG 4620 

CCAGAGCCGG CTGCCCGATT GATAACAGAG ATGGGTAAGT CCTTACCACC AACCTCTTTC 4680 

CAATTGGTTA CCTCACCTAT GAAGATTTGA CGAAGTTGCT CTGTCGTTAG GTTATCAACA 4740 

TCAACCTCCT TATTGACAAT CAGAGCCAAG CCAGCTACCG CGACCTTGTG GTCAACAAGA 4800 

GCAGAAGCAT CAATTCCGTC TTTTTCCTCA GCAAATACAT CTGAGTTTCC TATATCAACT 4860 

GCCCCAGACT GAACCTGGGA CAAGCCTGTA CCAGAACCTC CCCCTTGGAC ATTGACCGTT 4920 

TTTCCAACAT GGATCGTGCC AAATTCATCT GCCGCTACTT CAACCAAGGG. TTGCAAGGCA 4980 
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GTTGAGCCAA CAGCCGTTAT GGATTCTCCA CGATCAATCC AGCTAGCACA GCCTACTAAA , 5040 

CAAGCCGTCA GCCAAAAAGC GATAAGAGAC AGAGCAAGCT TTTTTCTTTT TTTCACTGTT 5100 

TTTCTCCTCG AAAATAATTA TGAATACTGT GAATTTTTTA AGTAGTTCTT TATGAGTTGA 5160 

CGCATGAATT CTTACCAAAT TTCTGCGCAA TTGATTATTT ATATAATATA GGCTATATTA 5220 

CTCTTTCCTA ACCTCCTTTT TTCATATGTG GATAAAATCT CTTGTCTATC CCTTCCCCCA 5280 

TTGTCACCCA TTATAGTCAT TTCGTGTCTC TTTTTCCCCT TTTTAATGCA AGGGAAATTA 5340 

CTCTCCTTAG ATGATAATCC AAAAGCTAGA AAGGTATCTC AAACCTCTCT ACTCTCCCAG 5400 

ACTAGTTTAC AACTAAAAGG AAAAGATTCT ATTTTATGAG AAATCTAGTT TACAAGCGGT 5460 

AAGAACGCTA ATAACTAAAC TTCTTGTACT CTTTGAAAAT CTCTTCAAAC CAGTGTTTTG 5520 

AGCTATCTAT GGCTAGCTTC CTAGTTTGCT CTTTGATTTT CATTGAGTAG TAAAACTACA 5580 

TGTAATGGCA ATCAAGATAT CAAGAATCAT CCTACTAAAA AAATCCATAC TTTCACTATA 5640 

ACATAGAATA AGATATTTGA CTAGCATTTT CATTTGAATC TGAGGCCTTT TGGAAAATAA 5700 

TTTTTCAAAA CATTTCCAGT AACCTTTGCA AAGCCCAAGC CATTGCCTTT AACCAAAACT 5760 

TGGTACCAAC CATTTGGCAG ACTTTCTGCC AGCTGAACGG TTTCTCCAGC CGCATACTTG 5820 

ACAAACGCTT CTTGGCCAAT TTCAACCGAC TGTTCGACCT GACTCGGTTT CAAGGCTAAA 5880 

CCAAGAGCGA AACTGGGCTC AAAGCGTTTC TTCTTAAAAG TACCCAGATG CAGTCCATTG 5940 

CGAGCAATCT TGAGCTTCCA TAAATCTGGC AAAAGTTCTG GCAAGAGATA AAGCTGGTCT 6000 

CCAAAAATCT GCAAGATACC CGGTAGATTG ACCTTCAAAT GGTTTTGGGC AAATTCCTGC 6060 

CACAAGGCAA CTTGTTCACG GCTGAGGTTA CTCTTACTTG CCTTAAATTT AGGAGCTGGA 6120 

TTGTTACCCT TAAACTGTAG ATGGGCAACA AACTGACCCT CTCCCTTAAA CTGATGAGGA 6180 

TACATCCGAG CCGTTTCTGG CAGGTCAATA CCAGCTACCA TTCCATTGAT ATGCTCTACT 6240 

GGCAACAAGT CAAAATCATA CTCTTCCAGC AACCAATTGA CAATCTCTTC GTTTTCCTCG 6300 

GGTGCCCAGG TACAGGTCGA ATAAACCAGA TGACCACCTT CAGCTAACAT GGTCACTGCA 6360 

TCCTCCAGAA TTTCTCTTTG CAAGCTAGCA CATTGACTCG GATAATCTAA GCTCCAATAG 6420 

TCCATAGCAT CAGGTTGCTT ACGAAACATT CCTTCACCAG AGCAAGGGGC ATCAAGAACG 6480 

ATTAAGTCAA AATAGCCTTT AAAGACCTTG ACCAAGCGGT CGGCAGATTC ATTGGTCACC 6540 

ACGACATTTG TCGCTCCAAA ACGCTCCATG TTTTCAACCA AAATCTTAGC CCGTTTGCTT 6600 

GAAATTTCAT TGGAAnCAAG TAGCCCCTCC CCTGCTAGAT AGGCTGCCAG TTGAGTTGAT 6660 

TTGCCCCCCG GTGCAGCAGC CAAGTCCAAG ACCTTCATAC CAGGACTGGG TTGGGCTACT 6720 
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TGAGCCACCA 


TTTGAGCAGC 


AGGTTCTTGC 


GAATAAACTA 


AACCTGTAGC 


ATGCTCAGGC 


6780 


GATTTCCCTG 


AAACCTTCCC 


ATAGTGGCCC 


CAAGGGGTTT 


GAGTAATGGC 


ATCAGAAAAG 


6840 


GAAAGTTGCT 


CTTCTTTTAA 


GGGATTGACC 


CGAAAGGCCG 


AAACCGCTTC 


CTCCTCAAAA 


6900 


GAGGCAAGAA 


AATCTCTTGC 


CTCATCTCCT 


AGTATCTCTT 


TATATTTTTC 


AACAAATCCT 


6960 


TCTGGAAATT 


GCATTTAAGT 


TCTTTTCCTT 


TCGTAAATAT 


AGGACTGAAT 


TTCCTCCTGC 


7020 


ATCTCAAGAG 


GCACCATCAT 


GACCGGCTGT 


CTGGTTTGAA 


AATCAGGAGC 


TTCACCAAAA 


7080 


AGGGTCACAA 


CCCGATAGCC 


CAGACTTTCC 


CCTAAAATAC 


TAGCTGCGGC 


ATAATCCCAT 


7140 


GGTTGCAGAT 


AAGTGAGATA 


GGTCAACAAA 


CGCCCTGACA 


AAATCTTGGC 


AAAACTAATG 


7200 


GCCGCACTTC 


CATAGACACG 


AACACCAAGA 


ACCGCTCGGC 


TCAAATCAGC 


CAGCCCCCAT 


7260 


1 1 1 \jVj ill 


CCAGCATACC 


ACTATTCCCT 


oLAA luAbAA 


AATCTCCAAG 


TGGTTTAGTT 


7320 


TTAAAAGGAG 


CTAGGGACCT 


ATCATTTAGA 


CAAACTGGAA 


ATTCCCCACC 


ACCGTGGTAA 


7380 


CAATCCCCTT 


TGACCACATC ATAAATCAGA 


CCAAACTGTC 


CCTGACCATT 


TTCAAAATAA 


7440 


GCCATCATAA 


CAGCAAAATC 


TTCCTGCTGG 


GCTACAAAAT 


TATTGGTACC 


ATCAATGGGA 


7500 


TCAATGACCC 


AAACCTTGCC 


CTCTTGAACC 


GAGGCTCGCA 


GACAACCTTC 


TTCAGCACAA 


7560 


ATCTTATCCT 


CAGGATAACG 


GGACAAAATC 


TCACCAACCA 


AGAGTTCCTG 


AACTTCTTTG 


7620 


TCCAGTCTGG 


TCACCAAATC 


TGTTGGAGAG 


GACTTGGTTT 


CAACACGCAA 


GTCTTCCTGC 


7680 


ATATGGTCAA 


GAATGTACTG 


ACCTGCTTTC 


TTAACAAGCT 


CTTTAGCAAA 


TTCAAATTTA 


7740 


CTTTCCAAGA 


GAAATCTTTC 


CTTCCCCTTT 


TTCTTTGGGG 






7780 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 4820 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GTAATGATAT AGGAACACCA GGTGACCTGA TGGGACGTCG TAAGCCTATG AACTACTAGC 60 

TGCTAAAGGC TTTAAAGATG GTATGGTACC ATATATCTCA AACCAATACG AAGAAGAAGC 120 

CAAACAAAAG GGCAAGACAA TCAATCTCTA CGGTAAAACA AGAGGTTTGG TTACAGATGA 180 

CTTGGTTTTG GAAAAGGTAT TTAATAACCA AT AT CAT ACT TGGAGTGAGT TTAAGAAAGC 240 

TATGTATCAA GAACGACAAG ATCAGTTTGA TAGATTGAAC AAAGTTACTT TTAATGATAC 300 

AACACAGCCT TGGCAAACAT TTGCCAAGAA AACTACAAGC AGTGTAGATG AATTACAGAA 360 
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ATTAATGGAC 


GTTGCTGTTC 


GTAAGGATGC 


AGAACACAAT 


TACTACCATT 


GGAATAACTA 


420 


CAATCCAGAC 


ATAGATAGTG 


AAGTCCACAA 


GCTCAAGAGA 


GCAATCTTTA 


AAGCCTATCT 


480 


TGACCAAACA 


AATGATTTTA 


GAAGTTCAAT 


TTTTGAGAAT 


AAAAAATAGT 


GTCTACTATT 


540 


AGGAAATAAA 


GTTTAAAAAG 


GTGATGAAGA 


ACAAACCAAG 


ATTCAAGCAG 


GAATTCCTAC 


600 


TGATAATGAA 


GTAAGTTATG 


ATCTTATTTA 


TCAGCAGGAA 


ACTCTTCCTG 


CAACAGGTTC 


660 


ATCAACTTCT GAGCTTACAG 


CTTTAGGCCT 


ATTAGCTGTT 


GGTAGTTTAG 


TTCTTTTGGT 


720 


TCATAATATG ACGGGAACAG 


TTTTTTGCTC 


CCTCTGAAAA 


GTCATCATTT 


GATGGCTTTT 


780 


TTCTATATAG 


GGTAAAAGAT 


AGGGTAAAAG 


GCTATCATCG 


GACAAAATAA 


AGAAGGCATG 


840 


ATATAATATA AAGTAGATTT 


CTATGTCATA AAACAAGAAC 


TGTTTGGACA 


TCATTCATTT 


900 


GAAAACTCTC TATGTTCAAA CAATAGTAAA ATAAAATAGG 


GGATCTAAAT 


CCTTGCTATG 


960 


AAAGGAAAAA 


ACTCAATGGC 


TACTATTCAA 


TGGTTTCCTG 


GTCACATGTC 


TAAAGCTCGT 


1020 


CGACAGGTGC 


AGGAGAATTT 


AAAATTTGTT 


GATTTTGTGA 


CGATTTTAGT 


AGATGCACGC 


1080 


TTGCCTCTAT 


CTAGTCAAAA 


TCCTATGTTG 


ACCAAGATTG 


TTGGTGATAA 


ACCAAAACTC 


1140 


TTGATTTTAA 


ACAAGGCCGA 


CTTGGCTGAT 


CCAGCAATGA 


CCAAGGAATG 


GCGTCAGTAT 


1200 


TTTGAATCAC 


AAGGAATCCA 


GACGCTAGCT 


ATCAACTCCA 


AAGAGCAAGT 


GACTGTAAAA 


1260 


GTTGTAACAG 


ATGCGGCCAA 


GAAGCTCATG 


GCTGATAAGA 


TTGCTCGCCA 


GAAAGAACGT 


1320 


GGGATTCAGA 


TTGAAACCTT 


GCGTACTATG 


ATTATCGGGA 


TTCCAAACGC 


TGGTAAATCA 


1380 


ACTCTGATGA 


ACCGTTTGGC 


TGGTAAAAAG 


ATTGCTGTTG 


TTGGAAACAA 


GCCAGGGGTC 


1440 


ACAAAAGGTC 


AACAATGGCT 


TAAAACCAAT 


AAAGACCTGG 


AAATCTTGGA 


TACACCGGGG 


1500 


ATTCTCTGGC 


CTAAGTTTGA 


GGATGAAACT 


GTTGCACTTA 


AGTTGGCATT 


GACTGGAGCT 


1560 


ATCAAAGACC AGTTGCTTCC 


TATGGATGAG 


GTTACCATTT 


TTGGTATCAA 


TTATTTCAAA 


1620 


GAACATTATC 


CAGAAAAGCT 


GGCTGAACGC 


TTCAAACAAA 


TGAAAATTGA 


AGAAGAAGCG 


1680 


CCTGTGATTA 


TTATGGATAT 


GACCCGCGCC 


CTCGGTTTCC 


GTGATGACTA 


TGACCGTTTT 


1740 


TAC AGTCTCT 


TCGTGAAGGA 


AGTCCGTGAT 


GGCAAACTCG 


GTAACTATAC 


CTTAGATACA 


1800 


TTGGAAGACC 


TCGATGGCAA 


CGATTAAAGA AATCAAAGAA 


TTCCTTGTGA 


CAGTCAAGGA 


1860 


GTTAGAAAGC 


CCTATTTTTT 


TAGAGCTTGA 


AAAGGATAAT 


CGCTCAGGAG 


TTCAAAAGGA 


1920 


AATCAGCAAG 


CGTAAAAGAG 


CCATTCAAGC 


TGAATTAGAT 


GAAAATTTGC 


GCTTGGAATC 


1980 


CATGCTTTCT 


TATGAAAAAG 


AACTTTATAA 


GCAAGGATTG 


ACCTTAATTG 


CAGGTATTGA 


2040 


TGAGGTTGGT 


CGTGGTCCTC 


TTGCTGGTCC 


TGTAGTCGCT 


GCGGCCGTTA 


TTTTATCTAA 


2100 
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AAATTGTAAG 


ATTAAAGGTC 


TCAACGACAG 


CAAGAAAATT 


CCTAAAAAGA 


AACATCTGGA 


2160 


GATTTTCCAA GCCGTTCAAG ACCAAGCCTT GTCGATTGGA ATTGGTATCA TAGATAATCA 


2220 


GGTCATCGAC 


CAAGTCAACA 


TCTATGAAGC 


AACCAAACTA 


GCCATGCAAG 


AAGCAATCTC 


2280 


CCAGCTCAGC 


CCTCAACCAG 


AGCACCTTTT 


GATTGATGCC 


ATGAAACTGG 


ACTTGCCCAT 


2340 


TTCACAAACC 


TCCATTATCA 


AAGGAGATGC 


CAACTCCCTC 


TCTATCGCAG 


CAGCATCTAT 


2400 


AGTAGCCAAG 


GTAACACGTG 


ATGAATTGCT 


GAAAGAATAC 


GATCAGCAGT 


TCCCTGGCTA 




TGATTTCGCT 


ACTAATGCAG 


GATATGGCAC 


AGCTAAACAT 


CTGGAAGGCC 


TCACAAAACT 




AGGAGTTACC 


CCAATTCACC 


GAACCAGCTT 


TGAACCCGTT 


AAATCACTGG 


TTTTAGGTAA 


■jeon 


AAAAGAAAGT TAATTGAAAG GAAATAACAT GGAGGAACAG 


TCGGAAATAG 


TCCGTTCTAA 




GAAAGAATTC GCCTTTGCAT CCAGCACTAT ACTATCCCAA GTTGGTCGAG GAATCATTGT 


^700 


CGGCCTCATC 


GTTGGAATTA 


TCGTCGGATC 


CTTTCGTTTC 


TTAATTGAAA 


AGGGCTTCCA 


2760 


CCTGATACAA 


GGAGTTTATC 


AAGATCAAGG 


GTACTTAGTG 


CGCAATCTTT 




2820 


TTTGTTTTAT 


ATACTCATCT 


GTTGGCTCAG 


TGCCAAACTA 


ACACGGTCAG 


AAAAAGATAT 


2880 


TAAAGGCTCA 


GGAATTCCTC 


AAGTCGAAGC 


CGAACTGAAA 


GGCCTCATGT 


CCCTCAACTG 


2940 


GTGGGGCATT 


CTTTGGAAAA 


AATATGTGCT 


AGGTATTCTT 


GCTATTGCCA 


GTGGACTCAT 


3000 


GCTGGGTCGA 


GAGGGACCCA 


GCATTCAACT 


TGGAGCAGTT 


GGTGGTAAAG 


GAATTGCCAA 


3060 


GTGGCTCAAA 


TCCAGTCCAG 


TAGAGGAACG 


TTCCTTGATT 


GCCAGTGGAG 


CTGCAGCAGG 


3120 


TTTAGCCGCA 


GCCTTTAATG 


CTCCTATTGC 


AGCACTTCTC 


TTTGTTGTAG 


AAGAAGTCTA 


ii on 
JlOU 


TCACCATTTT 


TCGCGCTTTT 


TCTGGGTCTC 


AACTCTAGCA 


GCCAGCATCG 


TAGCAAACTT 


3240 


TGTGTCTCTA 


CTCATGTTCG 


GTTTGACACC 


AGTATTGGAT 


ATGCCAGATA 


ACATTCCTCC 


3 300 


CATGACCCTA 


GATCAGTATT 


GGATATATCT 


CGTCATGGGA 


ATTTTCCTTG 


GATTTTCAGG 


3360 


TTTTCTCTAT 


GAGAAAGCTG 


TATTAAACGT 


TGGAAGAGTT 


TATGACTTGA 


TTGGTCAAAA 


3420 


AATCCATTTG 


GATAGGGCTT 


ATTATCCCAT 


CTTGGCTTTT ATCCTTATCA 


TACCAGTCGG 


' 3480 


AATCTTCTTA 


CCTCAAATCA 


TTGGTGGCGG 


AAATCAGCTT 


GTCCTTTCTT 


TAACTGAACA 


3540 


AAATTTTAGT 


TTCCAAGTTT 


TATTAGCTTA 


CTTTTTAATC 


CGCTTTATTT 


GGAGTATGAT 


3600 


TAGCTATGGA 


AGTGGACTGC 


CAGGAGGAAT 


TTTCCTCCCC 


ATTTTAGCTC 


TTGGTTCTTT 


3660 


GCTTGGTGCC TTAGTTGGTG 


TTATCTGTGT 


CAATCTTGGA 


CTTGTCAGTC 


AAGAGCAATT 


3720 


CCCTATATTT 


GTCATTCTAG 


GAATGAGTGG 


CTATTTTGGA 


GCCATATCAA 


AAGCTCCCTT 


3780 


AACCGCTATG 


ATCCTCGTAA 


CTGAGATGGT 


AGGAGATATT 


CGCAACCTTA 


TGCCACTTGG 


3840 


TCTTGTCACT 


CTTGTTTCTT 


ATATTATCAT 


GGATTTGCTC 


AAAGGTACGC 


CAGTCTATGA 


3900 
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AGCCATGCTG GAAAAAATGC TTCCAGAAGA AGTATCTAGC GAAGGAGAAG TTACACTTAT 3960 

CGAAATACCA GTTTCTGATA AAATTGCTGG GAAACAAGTT CATGAACTCA ACTTACCACA 4020 

CAACGTCCTC ATCACAACTC AAGTCCATAA TGGCAAGAGC CAAACAGTTA ACGGCTCAAC 4080 

CAGAATGTAT CTGGGTGATA TGATTCACCT GGTTATTCCA AAAAGTGAAA TTGGAAAAGT 4140 

CAAAGATTTG TTGTTGTAGT ATGAGTATTT ACATAATTTA TGTTATGTAA ATGATCAGTT 4200 

TGATTTATTT AGAAAACCGA TTCTCAGGAA TGAGATCGGT TATTTTTTAC TGATGAGGAA 4260 

TTTTACATAT AAATAATTGA ACTTTATTAA AAATAAGACT ATAATTAAGT TAGAAATGAT 4320 

AAAGTATAAA GCTAGAAAGG AGTTTACTGT ATCAAATCTG TACAGTAAGA TTAAAATCAT 4380 

GAAAAAGAAA ACAATAGCAA TTATATAGAG AAATGAAATA GAAATAGGAT AAAACAATCA 4440 

GGACAATCAA ATCAATTTCT AGCAATGTTT TAGAAGTCCA GATGTACTAT TCTAGTTTCA 4500 

ATCTATTATA CAATGTGTTT TGTATCTCAT AGCTCCTTAT ATAGCTCTTC AGTTATGTAG 4560 

TATTAACAGA AGTTTAGTGG GTGAGATTTT TATTATTTTC CTTATTCTGT TTTGTTTGTA 4620 

GGTCTAAGTC TTTTTATCAC TTTGAAAAAC TCCTATAACA TCTTTCCGAA AAACTATAAT 4680 

TTTCTTGAAA AATATACAAG TCTATGCTAT ACTACTAGTA TACTTACTTA TGGAGAAAAT 4740 

ACATGAAACG TGAGATTTTA CTGGAACGAA TCGACAAACT AAAACAACTC ATGCCCTGGT 4800 

AAGTTCTGGA ATACTACCAA 4820 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21338 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CTACGACATC ATGATTAACA GTCATGCGCT ACTACCAACT GAGCTATGGC GGATAAAATA 60 

GTCCGTACGG GATTCGAACC CGTGTTACCG CCGTGAAAAG GCGGTGTCTT AACCCCTTGA 120 

CCAACGGACC TTCTATCTGT AGCAGATATA ACCATTATAT CAATTTCTTG CTAATTGTCA 180 

ATCACTTTTG AGATTTTTTC TCTAAAATAT CTTTTAATTT TCTAATTTTT AATCTTGAAA 240 

TAGGACAACG ATGGTCTTCA TAGAAAACAA TTTCTAAGTT TTTTCGATCA ATTTCTCTGA 300 

TATTACCTAT ATTTACCAAA AATGACTTGT GAGGAGAATA AAATCGCTGA GTATGTTTGT 360 

CCTTTTCCTG AATATCTGTC ATGGTACCAT AAAACTCTTT TGCAAAATTC TTACCAATAA 420 
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TGCGCAATTT 


ATGAGATACC 


CCTGTTGTTT CAATATACAA AATATCATGG 


TAAGGAATTT 


480 


TTAAATCATT 


TCCCTTGTAA 


TTGTAGTCGA AATAATCTAC AACATCTTCA 


TTTTCAAGTA 


540 


ACATACTCTT 


CGTGTAGAAG 


ATATTTTGCT CAATTCTCTT CTTAAACATC 


TCATCATTGA 


600 


TATCCTTATC 


AACAAAATCT 


AGGGCTGATA CCTGGTATTT ATAGGTTAGA GTCGCAAACT 


660 


CTGATCGACT 


AGTGATAAAG ACGATAATAG CGTAAGGATT GTAATGACGA ATGAGCTGAG 


720 


CCACTTCAAA TCCCTTTTTC TCAATTCCAT GAATATCGAT ATCTAGGAAA 


TAAAGCTGAT 


780 


TTACTTCATC 


ATTTTCAATG 


TATTCTTCAA ATTCACGGAC TTTTCCCGTT 


GTCTTGTATG 


840 


ATATTGGAAT 


ATTCGATTCT 


TTCGAAATTT CATCCAATAT TCTCTCTAGT 


CTCACTTGAT 


900 


GTTCAATAAC 


ATCTTCTAAA 


ATTAAAACTT TCATTCAAAT TCCCTCTTAA 


ATCTAATGAT 


960 


TTGTCTAAAT 


GTACTGCCTT 


CCATCTCTGT TTCTAAAATA ATATTGTTGT 


ACTTATCTAG 


1020 


TAGTTCTTTC 


ACATTATTTA 


ATCCGACTCC GCGATTTCTT CCCTTAGTGG 


AGAATCCTAA 


1080 


GGCAAATAGA 


TCTCCTGAAG 


GAGTCATCGT CATTTTACAT GAATTCTGAA 


TCACAATAAC 


1140 


TGTTTCAGTT 


TCCATCTTAA 


TAACTGCTAC TTCCATCTGC TTTTTATAGC 


TATCAGCCGA 


1200 


TCCTTCGACA 


GCATTATTCA 


ATAAAACGCT CATGATACGA ACCAAATCCA 


ATAGTTCAAT 


1260 


TGGAAGCTTG 


GTAATCGTAT 


CTTTTACTTC CAGTGTAAAC TCTACACCAT 


TATTTCGAGC 


1320 


ATAGACAATT 


GACTGAGCAA 


CCAAACTTCG TAAAGCTGAG TCTTCTATGT 


TGTTCAAATC 


1380 


AAAGTAAGTG 


TACTTATCTG 


AACGCAATTT ATGATTTGCT TTGACTAAAA 


CTTCATTGTA 


1440 


AATTCTGTCA 


ATTTCCTGTA 


AATTACCACT GTCAATTGCC ATCTGCATGC 


TGACAAGCAT 


1500 


TCCAGCATAA 


TCATGTCGAA 


AACCACGGAT TTCATTATAC AGACCAACAA 


TTTCATCTGT 


1560 


GTAATTCTGT 


AAATGTTTCT 


GTTCAAATTT, CTTCTGCTTC AAAGCAATCT 


CTTTCTCCAT 


1620 


TTGAACTTTA 


TGAGAATTCA 


TTGCAAAGAA GGTCAAAAGG AGAGAGATAA 


AGACAATAGA 


1680 


TGACAAAATA 


CTTCCAAAAC 


TATTCAAATG TTTAATCGTA CTTACCATAT 


CTGAAACGAA 


1740 


AGATACAATA 


TGTAGCAATA 


GTAAAGCAAA AAATACTTTT TTCAAGAAAG 


GATAAAGGTA 


1800 


GTCCTTGTCA 


AAATAGGCTA 


GTTCCAAATG GAAATAGTAA ATGATTTTTA 


ATGTAACAAA 


1860 


ATAGGTTAAC 


ACCGTCACAA 


CGAAAAAGAA TGGGAAATGA TATTGTAAAA 


CAAAATTATC 


1920 


TCCTGTTATA 


GAGGAGAAAA 


TTACGGACAG AAAGTTATGA GTGCTCTCAT 


ATAAAAGAGA 


1980 


TAGTAGTAAA 


CTTAGGAATA 


GTCCTCTATC CCTCTCATAC TGTTTCATCC 


ATCGAAAATA 


2040 


GGAATATAAG CCCAAAGGAA ATAAAAATCT TTCAATCCCT ATTTTATCTA AATATAGAAG 


2100 


ATAAAAGGAA 


AATTCAAGTA 


CTATTTCAGT TAGTAATGTA TAAGCACCAA 


AAACGTATAA 


2160 


TTCTTTTCTA 


TTTATTCGAC 


CTTTACAAAT TAAACGGTAA CTGTGACTAA 


TAATTAAAAA 


2220 
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ATGAACAATA ACTGTCCCAA 


ATCCAAGTAA 


ATCCATTACT 


CTTTCTCCTT 


ATTTCATTAC 


2280 


TTTTTTCGTA GGAAAAGAAA 


ATCAAGGATG 


ATTCTTGAAA 


TCCTCATCTC 


CCCACCTTTA 


2340 


ATCTTTTGTA AGTCTTTTTC 


CTTCAAAGCT 


ACAAACTGTT 


CCAATTTAAC 


TGTGTTTTTC 


2400 


ATAATAAAAT CTCCTAAAAT 


GTTTTTTCTT 


GTAAGCTAAC 


TTACAAAAAC CATTATACAA 


2460 


AATGGAATTT CGTTTTAGAT 


AAAATTCTCT 


CAACTGTCAT 


TTTTTTCTCC 


CAAAGTGTAC 


2520 


TTTTTTAAGA AAAAAGCCGG 


GAAAATTCCC 


AGCTTTGCTA 


TTATATTGAT 


CCCAGCAGGA 


2580 


TTCGAACCTG CGACCGTTCG 


CTTAGAAGGC 


GAATGCTCTA 


TCCAGCTGAG 


CTATGAGACC 


2640 


TAATACAATT ATTCTACCAA 


AAATTCAATT 


AAAAGTCAAT 


TTTCTATTTA 


TGGTAGGGGA 


2700 


ATCCCTGCTG AATCGTAAAA 


GCGCGATAGA 


TTTGTTCAAC 


AAGAACTAGT CTCATTAACT 


2760 


GATGGGGTAA GGTTAGGCGA 


CCAAAACTGA 


CAGAAAGATT 


GGCTCTATTT 


TTTACAGATG 


2820 


ATGATAATCC TAAACTTCCC 


CCAATAATAA 


AAGTAAGAGT 


AGAAAATCCT 


TTTATAGAAG 


2880 


TTTCTTCTAA CTGCTTACTA 


AATTCTTCTG 


AGAAGAAAGT 


TTTCCCTTCA 


ATGGCTAACA 


2940 


CAATAACGAA ATCACGGTCA 


GCAATTTTTG 


ATAAAATTCT 


CTGACCTTCT 


ATTTCTAAAA 


3000 


TCTTTTGATT TTCTGATTCA 


CTGGCCTTAT 


CTGGTGTTTT 


TTCATCTGAT 


AACTCAATCA 


3060 


TTTCAAACTT AGCAAATCTA 


GAAATTCGTT 


TTGAATACTC 


TGCGATACCA 


TCTTTTAAAT 


3120 


ACTTTTCTTT CAGTTTCCCA 


ACTGTTACAA 


CTTTAATTTT 


CATGACTCTA 


TTCTAACATA 


3180 


TTCTCTATTT TTTCACATCT 


TATTCACAAA 


ATAAAAAATA 


GATTTCAATT 


AAGAAAATCA 


3240 


CAATTTCAAA AGAGTTATCC 


ACAGTTTGTG 


TAAAACTTTT 


GTGTTTAAGT 


TATAATTAAG 


3300 


CTAGTCAGTT TATACTTTCA 


GTAATTCAAA 


CATATGGAGG 


CAAATATGAA 


ACATCTAAAA 


3360 


ACATTTTACA AAAAATGGTT 


TCAATTATTA 


GTCGTTATCG 


TCATTAGCTT 


TTTTAGTGGA 


3420 


GCCTTGGGTA GTTTTTCAAT 


AACTCAACTA 


ACTCAAAAAA 


GTAGTGTAAA 


CAACTCTAAC 


3480 


AACAATAGTA CTATTACACA AACTGCCTAT AAGAACGAAA ATTCAACAAC ACAGGCTGTT 


3540 


AACAAAGTAA AAGATGCTGT 


TGTTTCTGTT 


ATTACTTATT 


CGGCAAACAG 


ACAAAATAGC 


3600 


GTATTTGGCA ATGATGATAC 


TGACACAGAT 


TCTCAGCGAA 


TCTCTAGTGA 


AGGATCTGGA 


3660 


GTTATTTATA AAAAGAATGA 


TAAAGAAGCT 


TACATCGTCA 


CCAACAATCA 


CGTTATTAAT 


3720 


GGCGCCAgCA AAGTAGATAT TCGATTGTCA GATGGGACTA AAGTACCTGG AGAAATTGTC 


3780 


GGAGCTGACA CTTTCTCTGA 


TATTGCTGTC 


GTCAAAATCT 


CTTCAGAAAA 


AGTGACAACA 


3840 


GTAGCTGAGT TTGGTGATTC 


TAGTAAGTTA 


ACTGTAGGAG 


AAACTGCTAT 


TGCCATCGGT 


3900 


AGCCCGTTAG GTTCTGAATA 


TGCAAATACT 


GTCACTCAAG 


GTATCGTATC 


CAGTCTCAAT 


3960 
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AGAAATGTAT CCTTAAAATC GGAAGATGGA CAAGCTATTT CTACAAAAGC CATCCAAACT 4020 

GATACTGCTA TTAACCCAGG TAACTCTGGC GGCCCACTGA TCAATATTCA AGGGCAGGTT 4080 

ATCGGAATTA CCTCAAGTAA AATTGCTACA AATGGAGGAA CATCTGTAGA AGGTCTTGGT 4140 

TTCGCAATTC CTGCAAATGA TGCTATCAAT ATTATTGAAC AGTTAGAAAA AAACGGAAAA 4200 

GTGACGCGTC CAGCTTTGGG AATCCAGATG GTTAATTTAT CTAATGTGAG TACAAGCGAC 4260 

ATCAGAAGAC TCAATATTCC AAGTAATGTT ACATCTGGTG TAATTGTTCG TTCGGTACAA 4320 

AGTAATATGC CTGCCAATGG TCACCTTGAA AAATACGATG TAATTACAAA AGTAGATGAC 4380 

AAAGAGATTG CTTCATCAAC AGACTTACAA AGTGCTCTTT ACAACCATTC TATCGGAGAC 4440 

ACCATTAAGA TAACCTACTA TCGTAACGGG AAAGAAGAAA CTACCTCTAT CAAACTTAAC 4500 

AAGAGTTCAG GTGATTTAGA ATCTTAATTG ACATCTATGT AAAGAAAGCT TTACATAAGA 45S0 

GAAAAGATGT GTTAGTGTAG AATCATGGAA AAATTTGAAA TGATTTCTAT CACAGATATA 4620 

CAAAAAAATC CCTATCAACC CCGAAAAGAA TTTGATAGAG AAAAACTAGA TGAACTAGCA 4680 

CAGTCTATCA AAGAAAATGG GGTCATTCAA CCGATTATTG TTCGTCAATC TCCTGTTATT 4740 

GGTTATGAAA TCcTTGCAGG AGAGAGACGC TATCGGGCTT CACTTTTAGC TGGTCTACGG 4800 

TCTATCCCAG CTGTTGTTAA ACAGATTTCA GACCAAGAGA TGATGGTCCA GTCCATTATT 4860 

GAAAATTTAC AGAGAGAAAA TTTAAACCCA ATAGAAGAAG CACGCGCCTA TGAATCTCTC 4920 

GTAGAGAAAG GATTCACCCA TGCTGAAATT GCAGATAAGA TGGGCAAGTC TCGTCCATAT 4980 

ATCAGCAACT CCATTCGTTT ACTTTCCTTG CCAGAACAGA TTCTTTCAGA AGTAGAAAAT 5040 

GGCAAACTAT CACAAGCCCA TGCGCGTTCC CTAGTTGGGT TAAATAAGGA ACAACAAGAC 5100 

TATTTCTTTC AACGGATTAT AGAAGAAGAT ATTTCTGTAA GGAAATTAGA AGCTCTTCTG 5160, 

ACAGAGAAAA AACAAAAGAA ACAGCAAAAA ACTAATCATT TCATACAAAA TGAAGAAAAA 5220 

CAGTTAAGAA AACTACTCGG ATTAGATGTA GAAATTAAAC TATCTAAAAA AGACAGTGGA 5280 

AAAATCATTA TTTCTTTTTC AAATCAAGAA GAATATAGTA GAATTATCAA CAGCCTGAAA 5340 

TAAGGCTGTT CTTTTATTTT TTTATCTCAC AAGGTTATCC ACTATGTTTT TCGATAAAAA 5400 

GCTTAATAAA TCAATAATTT CTTCTTTTAT CCCCAACCTG TGGATAAAGT TTGGTAACAT 5460 

TGTGGATTAT TTTTCACAGC TTGTGGAAAA TTCTTGCTAT CTATGGTAAA ATATCTCTAG 5520 

TATTAAACTT TTAAATAGTA AAGGAGGAGA AAGGATTGAA AGAAAAACAA TTTTGGAATC 5580 

GTATATTAGA ATTTGCACAA GAAAGACTGA CTCGATCCAT GTATGATTTC TATGCTATTC 5640 

AAGCTGAACT CATCAAGGTA GAGGAAAATG TTGCCACTAT ATTTCTACCT CGCTCTGAAA 5700 

TGGAAATGGT CTGGGAAAAA CAACTAAAAG ATATTATTGT AGTAGCTGGT TTTGAAATTT 5760 
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ATGACGCTGA AATAACTCCC CACTATATTT TCACCAAACC TCAAGATACG ACTAGCTCAC 5820 

AAGTTGAAGA AGCTACAAAT TTAACTCTTT ATAACTATAG TCCAAAGTTA GTATCTATTC 5880 

CTTATTCAGA TACGGGATTA AAAGAAAAGT ATACCTTTGA TAACTTTATT CAAGGGGATG 5940 

GAAATGTTTG GGCTGTATCA GCCGCTTTAG CTGTCTCTGA AGATTTGGCT CTGACCTATA 6000 

ACCCTCTTTT TATCTATGGA GGACCAGGCC TTGGTAAGAC TCACTTATTA AACGCTATTG 6060 

GAAATGAAAT TCTAAAAAAT ATTCCTAATG CGCGTGTTAA ATATATCCCT GCCGAAAGCT 6120 

TTATTAATGA CTTTCTTGAT CACCTAAGAC TTGGGGAAAT GGAAAAGTTT AAAAAGACCT 6180 

ATCGTAGTCT TGATCTTTTG TTAATCGATG ATATCCAGTC ACTCAGCGGA AAAAAAGTCG 6240 

CAACTCAGGA AGAATTTTTC AATACCTTTA ACGCCCTTCA TGACAAGCAA AAACAGATTG 6300 

TCCTAACGAG TGATCGTAGT CCAAAACATC TAGAAGGGCT CGAGGAGAGG CTTGTCACGC 6360 

GTTTTAGTTG GGGATTGACA CAAACTATCA CCCCCCCTGA CTTTGAAACA CGTATTGCCA 6420 

TTTTACAAAG TAAGACGGAA CATTTAGGCT ACAATTTCCA AAGTGATACT CTAGAATACC 6480 

TAGCTGGGCA ATTTGATTCA AATGTTCGAG ATCTTGAGGG AGCCATCAAC GACATCACTT 6540 

TAATTGCCAG AGTAAAAAAA ATCAAGGATA TCACTATTGA TATTGCTGCA GAAGCCATTA 6600 

GAGCCCGCAA ACAAGATGTT AGCCAAATGC TCGTCATCCC AATTGATAAA ATCCAAACTG 6660 

AAGTTGGTAA CTTTTATGGT GTTAGTATCA AAGAAATGAA GGGAAGTAGA CGCCTTCAAA 6720 

ATATTGTTTT GGCCCGTCAA GTAGCCATGT ATTTATCTAG AGAACTAACA GATAATAGTC 6780 

TTCCAAAAAT TGGGAAGGAA TTTGGGGGAA AAGATCATAC CACAGTCATT CATGCCCATG 6840 

CCAAAATAAA ATCTTTGATT GATCAAGACG ATAATTTACG TTTAGAAATT GAATCAATCA 6900 

AAAAGAAAAT CAAATAATTT GTGGATAACT TTTAGTTTTT TATCTTTTTT ATCCACATTT 6960 

TTTAAACAAG CTAAAAAACT TGATATGACT TGTTTAAAGG CTGTTTTCCA CAGATTTCAC 7020 

AGACTCTATT ATTACTATTA TCTTTCTAAT ACTAAAAATA AATAAAGGAG AATCCATGAT 7080 

TCATTTTTCA ATTAATAAAA ATTTATTTCT ACAAGCATTA AATACTACTA AGAGAGCTAT 7140 

TAGTTCTAAA AATGCCATTC CTATTTTATC AACAGTAAAA ATTGACGTGA CCAATGAAGG 7200 

TATTACTTTA ATTGGTTCAA ATGGTCAAAT TTCAATTGAA AATTTTATTT CTCAAAAAAA 7260 

TGAAGATGCT GGTTTGTTAA TTACTTCTTT AGGTTCGATC CTTCTTGAAG CTTCTTTCTT 7320 

TATCAATGTA GTATCTAGTT TACCTGATGT AACTCTTGAT TTTAAAGAAA TTGAACAAAA 7380 

TCAAATTGTT TTAACCAGTG GCAAATCAGA AATTACCCTA AAAGGAAAAG ATAGCGAACA 7440 

ATATCCACGA ATCCAAGAAA TTTCAGCAAG CACTCCTTTA ATACTTGAAA CAAAATTACT 7500 
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CAAGAAAATT ATTAATGAAA CAGCCTTTGC TGCAAGTACA CAAGAGAGTC GTCCGATTTT 7560 

AACAGGTGTC CACTTCGTAT TGAGTCAACA CAAAGAGTTA AAAACAGTTG CAACAGACTC 7620 

TCATCGCCTA AGCCAGAAAA AATTGACTCT TGAAAAAAAT AGTGATGATT TTGATGTCGT 7680 

AATTCCTAGC CGTTCTCTAC GCGAATTTTC AGCGGTATTT ACAGATGATA TCGAAACTGT 7740 

AGAGATTTTC TTTGCCAATA ACCAAATCCT CTTTAGAAGC GAAAATATTA GCTTCTATAC 7800 

TCGTCTCCTA GAAGGAAACT ATCCTGATAC AGATCGCTTG ATTCCAACAG ACTTTAACAC 7860 

TACT ATT ACT TTTAATGTGG TAAACTTACG CCAGTCAATG GAGCGTGCCC GTCTTTTATC 7920 

AAGTGCGACT CAAAATGGTA CTGTGAAACT TGAAATTAAG GATGGGGTTG TTAGCGCCCA 7980 

TGTTCACTCT CCAGAAGTTG GTAAAGTAAA CGAAGAAATC GATACTGATC AGGTTACTGG 8040 

TGAAGATTTG ACCATTAGTT TCAACCCAAC TTACTTGATT GATTCTCTTA AAGCTTTAAA 8100 

TAGCGAAAAG GTGACTATTA GCTTTATCTC AGCTGTTCGT CCATTTACTC TTGTGCCAGC 8160 

AGATACTGAC GAAGACTTCA TGCAGCTCAT TACACCAGTT CGTACAAATT AAGTGAAAGA 8220 

GGTTGAGCCT GGCTCGCCTC TTTTATGATA TAATCGAAAA AGAAAAGGAG AGTAGTATGT 8280 

ATCAAGTTGG AAATTTTGTT GAGATGAAAA AATCACACGC TTGTACAATC AAGTCGACTG 8340 

GTAAAAAGGC TAATCGTTGG GAAATTACAC GTGTAGGAGC AGATATCAAA ATAAAATGTA 8400 

GTAATTGTGA GCATGTTGTC ATGATGGGGC GATATGATTT TGAGCGAAAA ATGAATAAAA 8460 

TTATTGACTG AGAACCCTTA GTTAGAGGGT TAGCACTTTA TCCCTTTTTG TGTTATAATA 8520 

TTAGGGATTG AAATGAAAAC GGAGAATGAG AAATATGGCT TTGACAGCAG GTATCGTTGG 8580 

TTTGCCAAAC GTTGGTAAAT CAACACTATT TAATGCAATT ACAAAAGCAG GAGCAGAGGC 8640 

AGCAAACTAC CCATTTGCGA CGATTGATCC AAATGTTGGA ATGGTGGAAG TTCCAGATGA 8700 

ACGCCTACAA AAACTAACTG AAATGATAAC TCCTAAAAAG ACAGTTCCCA CAACATTTGA 8760 

ATTTACAGAT ATTGCAGGGA TTGTAAAAGG AGCTTCAAAA GGAGAGGGGC TAGGGAATAA 8820 

ATTCTTGGCC AATATTCGTG AAGTAGATGC GATTGTTCAC GTAGTTCGTG CTTTTGATGA 8880 

TGAAAATGTA ATGCGCGAGC AAGGACGTGA AGACGCCTTT GTAGATCCAC TTGCAGAT AT 8940 

TGATACCATT AATCTGGAAT TGATTCTTGC TGACTTAGAA TCAGTGAACA AACGATATGC 9000 

GCGTGTAGAA AAGATGGCAC GTACGCAAAA AGATAAAGAA TCAGTAGCAG AATTCAATGT 9060 

TCTTCAAAAG ATTAAACCAG TCCTAGAAGA CGGGAAATCA GCTCGTACCA TTGAATTTAC 9120 

AGATGAGGAA CAAAAGGTTG TCAAAGGTCT TTTCCTTTTG ACGACTAAAC CAGTTCTTTA 9180 

TGTAGCTAAT GTGGACGAGG ATGTGGTTTC AGAACCTGAC TCTATCGACT ATGTCAAACA 9240 

AATTCGTGAA TTTGCAGCGA CAGAAAATGC TGAAGTAGTC GTTATTTCTG CGCGTGCTGA 9300 
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GGAAGAAATT TCTGAATTGA ATGATGAAGA TAAAAAAGAG TTTCTTGAAG CCATTGGTTT 9360 

GACAGAATCA GGTGTAGATA AGTTGACGCG TGCAGCTTAC CACTTGCTTG GATTGGGAAC 9420 

TTACTTCACA GCTGGTGAAA AAGAAGTTCG CGCTTGGACT TTCAAACGTG GTATGAAGGC 9480 

TCCTCAAGCA GCTGGTATTA TCCACTCAGA CTTTGAAAAA GGCTTTATTC GTGCAGTAAC 9540 

CATGTCATAT GAAGATCTAG TGAAATACGG ATCTGAAAAG GCCGTAAAAG AAGCTGGACG 9600 

CTTGCGTGAA GAAGGAAAAG AATATATCGT TCAAGATGGC GATATCATGG AATTCCGCTT 9660 

TAATGTCTAA AAATTAATAA ATGGTGTCAA TTAGGTTGGA AAAAAATTCC AACCCTTTTG 9720 

GCTTTTGAAA GGAAAAATAA ATGACCAAAT TACTTGTAGG CTTGGGAAAT CCAGGGGATA 9780 

AATATTTTGA AACAAAACAC AATGTTGGTT TTATGTTGAT TGATCAACTA GCGAAGAAAC 9840 

AGAATGTCAC TTTTACACAC GATAAGATAT TTCAAGCTGA CCTAGCATCC TTTTTCCTAA 9900 

ATGGAGAAAA AATTTATCTG GTTAAACCAA CGACCTTTAT GAATGAAAGT GGAAAAGCAG 9960 

TTCATGCTTT ATTAACTTAC TATGGTTTGG ATATTGACGA TTTACTTATC ATTTACGATG 10020 

ATCTTGACAT GGAAGTTGGG AAAATTCGTT TAAGAGCAAA AGGCTCAGCA GGTGGTCATA 10080 

ATGGTATCAA GTCTATTATT CAACATATAG GAACTCAGGT CTTTAACCGT GTTAAGATTG 10140 

GAATTGGAAG ACCTAAAAAT GGTATGTCAG TTGTTCATCA TGTTTTGAGT AAGTTTGACA 10200 

GGGATGATTA TATCGGTATT TTACAGTCTG TTGACAAAGT TGACGATTCT GTAAACTACT 10260 

ATTTACAAGA GAAAAATTTT GAGAAAACAA TGCAGAGGTA TAACGGATAA ATGGTGACCT 10320 

TATTAGATTT ATTCTCAGAA AATGATCAGA TTAAAAAATG GCATCAAAAT TTAACAGATA 10380 

AGAAAAGACA ACTAATACTT GGTTTATCAA CATCTACTAA GGCTCTTGCA ATTGCAAGCA 10440 

GTTTAGAAAA AGAAGATAGG ATTGTGTTAT TGACGTCAAC TTATGGAGAA GCAGAAGGAC 10500 

TTGTTAGTGA TCTTATTTCT ATCTTGGGTG AGGAACTCGT CTATCCATTT TTGGTAGATG 10560 

ATGCTCCTAT GGTGGAGTTT TTGATGTCTT CACAGGAAAA AATTATTTCA CGGGTTGAAG 10620 

CCTTGCGTTT TTTGACTGAT TCATCTAAGA AAGGGATTTT AGTTTGTAAT ATCGCAGCAA 10680 

GTCGATTGAT TTTACCGTCT CCCAATGCAT TCAAAGATAG TATTGTAAAA ATCTCAGTTG 10740 

GTGAAGAATA TGATCAACAC GCGTTTATCC ATCAGTTAAA GGAAAATGGC TATCGAAAAG 10800 

TTACTCAAGT ACAAACTCAG GGCGAATTTA GTCTTCGAGG ' AGATATTTTA GATATTTTTG 10860 

AAATATCCCA GTTAGAACCT TGTCGAATTG AGTTTTTTGG TGATGAAATT GATGGTATCA 10920 

GGTCATTTGA AGTAGAAACA CAATTATCGA AAGAAAATAA GACAGAACTC ACTATCTTTC 10980 

CAGCTAGTGA TATGCTTTTG AGAGAAAAGG ATTATCAACG AGGACAGTCA GCTTTAGAAA 11040 

{ 
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AACAAATTTC 


AAAAACTTTA 


TCACCTATTT 


TGAAATCATA 


CCTAGAAGAA 


ATTCTTTCAA 


11100 


GTTTTCACCA 


AAAACAAAGT 


CATGCAGACT 


CTCGGAAGTT 


TTTATCTTTG 


TGCTATGATA 


11160 


AGACATGGAC 


TGTCTPTGAT 


TATATTGAAA 


AAGATACTCC 


AATATTCTTT 


GATGATTATC 


11220 


AAAAATTGAT 


GAATCAGTAT 


GAAGTCTTTG 


AAAGAGACTT 


AGCGCAGTAC 


TTTACAGAAG 


11280 


AATTACAGAA 


TAGTAAAGCA 


TTTTCTGATA 


TGCAGTATTT 


TTCTGATATT 


GAACAAATCT 


11340 


ATAAAAAACA 


AAGTCCAGTG 


ACCTTTTTCT 


CTAATCTTCA AAAGGGTTTA GGAAATCTCA 


11400 


AATTTGACAA 


AATTTATCAA 


TTCAATCAAT 


ATCCTATGCA 


GGAATTTTTC 


AATCAGTTTT 


11460 


CTTTTCTAAA 


AGAAGAAATT 


GAACGATATA 


AAAAAATGGA 


TTACACCATT 


ATTCTGCAGT 


11520 


CTAGCAATTC 


AATGGGAAGT 


AAAACATTGG 


AGGATATGTT 


AGAGGAATAT 


CAGATTAAAT 


11580 


TGGATTCTAG 


AGATAAGACA AATATCTGTA AAGAATCTGT AAACTTAATA GAGGGTAATC 


11640 


TCAGACATGG 


TTTTCATTTT 


GTAGATGAAA 


AGATTTTATT 


GATAACTGAA 


CATGAGATTT 


11700 


TTCAAAAGAA 


ATTAAAGCGT 


CGTTTTCGAA 


GACAACATGT 


TTCAAATGCA 


GAGAGATTAA 


11760 


AAGATTACAA 


TGAACTTGAA 


AAAGGGGACT 


ATGTTGTCCA 


TCATATCCAT 


GGGATTGGTC 


11820 


AATATCTAGG AATTGAAACC ATTGAAATCA AGGGAATTCA TCGCGATTAT GTCAGTGTCC 


11880 


AATACCAAAA 


TGGTGATCAA 


ATTTCTATCC 


CCGTGGAACA 


GATTCATCTA 


CTGTCCAAAT 


11940 


ATATTTCAAG 


TGATGGTAAA 


GCTCCAAAAC 


TCAATAAATT 


AAATGACGGT 


CATTTTAAAA 


12000 


AGGCCAAGCA 


AAAGGTTAAG 


AACCAGGTAG 


AGGATATAGC 


TGATGATTTA 


ATCAAACTCT 


12060 


ACTCTGAACG 


TAGTCAGTTG 


AAGGGTTTTG 


CTTTCTCAGC 


TGATGATGAT 


GATCAAGATG 


12120 


CCTTTGATGA 


TGCTTTCCCT 


TATGTTGAAA 


CGGATGATCA 


ACTTCGTAGT 


ATTGAGGAAA 


12180 


TCAAGAGGGA 


TATGCAGGCT 


TCTCAGCCAA 


TGGATCGACT 


TTTAGTTGGG 


GATGTTGGTT 


12240 


TTGGAAAGAC 


TGAAGTTGCT 


ATGCGTGCAG 


CCTTTAAAGC 


AGTCAATGAT 


CACAAACAGG 


12300 


TTGTCATTCT 


AGTTCCGACG 


ACGGTTTTAG 


CGCAACAGCA 


CTATACGAAT 


TTTAAGGAAC 


12360 


GATTCCAAAA 


TTTTGCAGTT 


AATATTGATG 


TGTTGAGTCG 


CTTTAGAAGT 


AAAAAAGAGC 


12420 


AGACTGCAAC 


ACTTGAAAAA 


TTGAAAAACG 


GTCAAGTCGA 


TATTTTGATT 


GGAACACATC 


12480 


GTGTTTTGTC 


AAAAGATGTT 


GTGTTTGCTG 


ATTTGGGCTT 


GATGATTATT 


GATGAGGAAC 


12540 


AGCGATTTGG 


TGTCAAGCAT 


AAGGAAACTT 


TGAAAGAACT 


GAAGAAACAA 


GTGGATGTCC 


12600 


TAACCTTGAC 


CGCTACGCCA 


ATCCCTCGTA 


CCCTCCATAT 


GTCTATGCTG 


GGAATCAGAG 


12660 


ATTTATCTGT 


TATTGAAACT 


CCGCCGACTA 


ATCGCTATCC 


TGTTCAGACC 


TATGTTTTGG 


12720 


AAAAGAATGA 


TAGTGTCATT 


CGTGATGCTG 


TCTTGCGTGA 


AATGGAGCGT 


GGAGGTCAAG 


12780 


TTTATTATCT 


TTACAACAAA 


GTTGACACAA 


TTGTTCAGAA 


GGTTTCAGAA 


TTACAGGAGT 


12840 
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TGATTCCGGA GGCTTCGATT GGATATGTTC ATGGTCGAAT GAGTGAAGTC CAGTTGGAAA 12900 

ATACTCTATT AGACTTTATT GAGGGACAAT ACGATATCTT GGTGACGACT ACTATTATTG 12960 

AGACAGGGGT GGACATTCCA AATGCTAATA CTTTATTTAT TGAAAATGCG GACCATATGG 13020 

GCTTGTCAAC CTTATATCAG TTAAGAGGAA GAGTCGGTCG TAGTAATCGT ATTGCTTATG 13080 

CTTATCTCAT GTATCGTCCA GAAAAATCAA TCAGTGAAGT CTCTGAAAAG AGATTAGAAG 13140 

CGATTAAAGG ATTTACAGAA TTGGGCTCTG GCTTTAAGAT TGCAATGCGA GATCTTTCGA 13200 

TTCGTGGAGC AGGAAATCTT TTAGGAAAAT CCCAGTCTGG TTTCATTGAT TCTGTTGGTT 13260 

TTGAATTGTA TTCGCAGTTA TTAGAGGAAG CTATTGCTAA ACGAAACGGT AATGCTAACG 13320 

CTAACACAAG AACCAAAGGG AATGCTGAGT TGATTTTGCA AATTGATGCC TATCTTCCTG 13380 

ATACTTATAT TTCTGATCAA CGACATAAGA TTGAAATTTA CAAGAAAATT CGTCAAATTG 13440 

ACAACCGTGT CAATTATGAA GAGTTACAAG AGGAGTTGAT AGACCGTTTT GGAGAATACC 13500 

CAGATGTAGT AGCCTATCTG TTAGAGATTG GTTTGGTCAA ATCATACTTG GACAAGGTCT 13560 

TTGTTCAACG TGTGGAAAGA AAAGATAATA AAATTACAAT TCAATTTGAA AAAGTCACTC 13620 

AACGACTGTT TTTAGCTCAA GATTATTTTA AAGCTTTATC CGTAACGAAC TTAAAAGCAG 13680 

GCATCGCTGA GAATAAGGGA TTAATGGAGC TTGTATTTGA TGTCCAAAAT AAGAAAGATT 13740 

ATGAAATTTT AGAAGGTTTG CTGATTTTTG GAGAAAGTTT ATTAGAGATA AAAGAGTCTA 13800 

AGGAAGAAAA TTCCATTTGA TATTTTTCTT CTATAAAATA GATAAAAATG GTACAATAAT 13860 

AAATTGAGGT AATAAGGATG AGATTAGATA AATATTTAAA AGTATCGCGA ATTATCAAGC 13920 

GTCGTACAGT CGCAAAGGAA GTAGCAGATA AAGGTAGAAT CAAGGTTAAT GGAATCTTGG 13980 

CCAAAAGTTC AACGGACTTG AAAGTTAATG ACCAAGTTGA AATTCGCTTT GGCAATAAGT 14040 

TGCTGCTTGT AAAAGTACTA GAGATGAAAG ATAGTACAAA AAAAGAAGAT GCAGCAGGAA 14100 

TGTATGAAAT TATCAGTGAA ACACGGGTAG AAGAAAATGT CTAAAAATAT TGTACAATTG 14160 

AATAATTCTT TTATTCAAAA TGAATACCAA CGTCGTCGCT ACCTGATGAA AGAACGACAA 14220 

AAACGGAATC GTTTTATGGG AGGGGTATTG ATTTTGATTA TGCTATTATT TATCTTGCCA 14280 

ACTTTTAATT TAGCGCAGAG TTATCAGCAA TTACTCCAAA GACGTCAGCA ATTAGCAGAC 14340 

TTGCAAACTC AGTATCAAAC TTTGAGTGAT GAAAAGGATA AGGAGACAGC ATTTGCTACC 14400 

AAGTTGAAAG ATGAAGATTA TGCTGCTAAA TATACACGAG CGAAGTACTA TTATTCTAAG 14460 

TCGAGGGAAA AAGTTTATAC GATTCCTGAC TTGCTTCAAA GGTGATAAAA TGGAAAATTT 14520 

ATTAGACGTA ATAGAGCAAT TTTTGAGTTT GTCAGATGAA AAGCTGGAAG AATTGGCTGA 14580 
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TAAAAATCAA TTATTGCGTT TACAAGAAGA AAAGGAAAGG AAGAATGCGT AAATTCTTAA 14640 

TTATTTTGTT GCTACCAAGT TTTTTGACCA TTTCAAAAGT CGTTAGCACA GAAAAAGAAG 14700 

TCGTCTATAC TTCGAAAGAA ATTTATTACC TTTCACAATC TGACTTTGGT ATTTATTTTA 14760 

GAGAAAAATT AAGTTCTCCC ATGGTTTATG GAGAGGTTCC TGTTTATGCG AATGAAGATT 14820 

TAGTAGTGGA ATCTGGGAAA TTGACTCCCA AAACAAGTTT TCAAATAACC GAGTGGCGCT 14880 

TAAATAAACA AGGAATTCCA GTATTTAAGC TATCAAATCA TCAATTTATA GCTGCGGACA 14940 

AACGATTTTT ATATGATCAA TCAGAGGTAA CTCCAACAAT AAAAAAAGTA TGGTTAGAAT 15000 

CTGACTTTAA ACTGTACAAT AGTCCTTATG ATTTAAAAGA AGTGAAATCA TCCTTATCAG 15060 

CTTATTCGCA AGTATCAATC GACAAGACCA TGTTTGTAGA . AGGAAGAGAA TTTCTACATA 15120 

TTGATCAGGC TGGATGGGTA GCTAAAGAAT CAACTTCTGA AGAAGATAAT CGGATGAGTA 15180 

AAGTTCAAGA AATGTTATCT GAAAAATATC AGAAAGATTC TTTCTCTATT TATGTTAAGC 15240 

AACTGACTAC TGGAAAAGAA GCTGGTATCA ATCAAGATGA AAAGATGTAT GCAGCCAGCG 15300 

TTTTGAAACT CTCTTATCTC TATTATACGC AAGAAAAAAT AAATGAGGGT CTTTATCAGT 153 60 

TAGATACGAC TGTAAAATAC GTATCTGCAG TCAATGATTT TCCAGGTTCT TATAAACCAG 15420 

AGGGAAGTGG TAGTCTTCCT AAAAAAGAAG ATAATAAAGA ATATTCTTTA AAGGATTTAA 15480 

TTACGAAAGT ATCAAAAGAA TCTGATAATG TAGCTCATAA TCTATTGGGA T ATT AC ATT T 15540 

CAAACCAATC TGATGCCACA TTCAAATCCA AGATGTCTGC CATTATGGGA GATGATTGGG 15600 

ATCCAAAAGA AAAATTGATT TCTTCTAAGA TGGCCGGGAA GTTTATGGAA GCTATTTATA 15660 

ATCAAAATGG ATTTGTGCTA GAGTCTTTGA CTAAAACAGA TTTTGATAGT CAGCGAATTG 15720 

CCAAAGGTGT TTCTGTTAAA GTAGCTCATA AAATTGGAGA TGCGGATGAA TTTAAGCATG 15780 

ATACGGGTGT TGTCTATGCA GATTCTCCAT TTATTCTTTC TATTTTCACT AAGAATTCTG 15840 

ATTATGATAC GATTTCTAAG ATAGCGAAGG ATGTTTATGA GGTTCTAAAA TGAGGGAACC 15900 

AGATTTTTTA AATCATTTTC TCAAGAAGGG ATATTTCAAA AAGCATGCTA AGGCGGTTCT 15960 

AGCTCTTTCT GGTGGATTAG ATTCCATGTT TCTATTTAAG GTATTGTCTA CT.T ATCAAAA 16020 

AGAGTTAGAG ATTGAATTGA TTCTAGCTCA TGTGAATCAT AAGCAGAGAA TTGAATCAGA 16080 

TTGGGAAGAA AAGGAATTAA GGAAGTTGGC TGCTGAAGCA GAGCTTCCTA TTTATATCAG 16140 

CAATTTTTCA GGAGAATTTT CAGAAGCGCG TGCACGAAAT TTTCGTTATG ATTTTTTTCA 16200 

AGAGGTCATG AAAAAGACAG GTGCGACAGC TTTAGTCACT GCCCACCATG CTGATGATCA 16260 

GGTGGAAACG ATTTTTATGC GCTTGATTCG AGGAACTCGC TTGCGCTATC TATCAGGAAT 16320 

TAAGGAGAAG CAAGTAGTCG GAGAGATAGA AATCATTCGT CCCTTCTTGC ATTTTCAGAA 16380 
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AAAAGACTTT CCATCAATTT TTCACTTTGA AGATACATCA AATCAGGAGA ATCATTATTT 16440 

TCGAAATCGT ATTCGAAATT CTTACTTACC AGAATTGGAA AAAGAAAATC CTCGATTTAG 16500 

GGATGCAATC TTAGGCATTG GCAATGAAAT TTTAGATTAT GATTTGGCAA TAGCTGAATT 16560 

ATCTAACAAT ATTAATGTGG AAGATTTACA GCAGTTATTT TCTTACTCTG AGTCTACACA 16620 

AAGAGTTTTA CTTCAAACTT ATCTGAATCG TTTTCCAGAT TTGAATCTTA CAAAAGCTCA 16680 

GTTTGCTGAA GTTCAGCAGA TTTTAAAATC TAAAAGCCAG TATCGTCATC CGATTAAAAA 16740 

TGGCTATGAA TTGATAAAAG AGTACCAACA GTTTCAGATT TGTAAAATCA GTCCGCAGgC 16800 

TGATGAAAAG GAAGATGAAC TTGTGTTACA CTATCAAAAT CAGGTAGCTT ATCAAGGATA 16860 

TTTATTTTCT TTTGGACTTC CATTAGAAGG TGAATTAATT CAACAAATAC CTGTTTCACG 16920 

TGAAACATCC ATACACATTC GTCATCGAAA AACAGGAGAT GTTTTGATTA AAAATGGGCA 16980 

TAGAAAAAAA CTCAGACGTT TATTTATTGA TTTGAAAATC CCTATGGAAA AGAGAAACTC 17040 

TGCTCTTATT ATTGAGCAAT TTGGTGAAAT TGTCTCAATT TTGGGAATTG CGACCAATAA 17100 

TTTGAGTAAA AAAACGAAAA ATGATATAAT GAACACTGTA CTTTATATAG AAAAAATAGA 17160 

TAGGTAAAAA ATGTTAGAAA ACGATATTAA AAAAGTCCTC GTTTCACACG ATGAAATTAC 17220 

AGAAGCAGCT AAAAAACTAG GTGCTCAATT AACTAAAGAC TATGCAGGAA AAAATCCAAT 17280 

CTTAGTTGGG ATTTTAAAAG GATCTATTCC TTTTATGGCT GAATTGGTCA AACATATTGA 17340 

TACACATATT GAAATGGACT TCATGATGGT TTCTAGCTAC CATGGTGGAA CAGCAAGTAG 17400 

TGGTGTTATC AATATTAAAC AAGATGTGAC TCAAGATATC AAAGGAAGAC ATGTTCTATT 17460 

TGTAGAAGAT ATCATTGATA CAGGTCAAAC TTTGAAGAAT TTGCGAGATA TGTTTAAAGA 17520 

AAGAGAAGCA GCTTCTGTTA AAATTGCAAC CTTGTTGGAT AAACCAGAAG GACGTGTTGT 17580 

AGAAATTGAG GCAGACTATA CTTGCTTTAC TATCCCAAAT GAGTTTGTAG TAGGTTATGG 17640 

TTTAGACTAC AAAGAAAATT ATCGTAATCT TCCTTATATT GGAGTATTGA AAGAGGAAGT 17700 

GTATTCAAAT TAGAAAGAAT AATCTTTAAT GAAAAAACAA AATAATGGTT TAATTAAAAA 17760 

TCCTTTTCTA TGGTTATTAT TTATCTTTTT CCTTGTGACA GGATTCCAGT ATTTCTATTC 17820 

TGGGAATAAC TCAGGAGGAA GTCAGCAAAT CAACTATACT GAGTTGGTAC AAGAAATTAC 17880 

CGATGGTAAT GTAAAAGAAT TAACTTACCA ACCAAATGGT AGTGTTATCG AAGTTTCTGG 17940 

TGTCTATAAA AATCCTAAAA CAAGTAAAGA AGAAACAGGT ATTCAGTTTT TCACGCCATC 18000 

TGTTACTAAG GTAGAGAAAT TTACCAGCAC TATTCTTCCT GCAGATACTA CCGTATCAGA 18060 

ATTGCAAAAA CTTGCTACTG ACCATAAAGC AGAAGTAACT GTTAAGCATG AAAGTTCAAG 18120 
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TGGTATATGG 


ATTAATCTAC 


TCGTATCCAT 


TGTGCCATTT 


GGAATTCTAT 


TCTTCTTCCT 


18180 


ATTCTCTATG 


ATGGGAAATA 


TGGGAGGAGG 


CAATGGCCGT 


AATCCAATGA 


GTTTTGGACG 


18240 


TAGTAAGGCT AAAGCAGCAA ATAAAGAAGA TATTAAAGTA AGATTTTCAG ATGTTGCTGG 


18300 


AGCTGAGGAA GAAAAACAAG AACTAGTTGA AGTTGTTGAG TTCTTAAAAG ATCCAAAACG 


18360 


ATTCACAAAA 


CTTGGAGCCC 


GTATTCCAGC 


AGGTGTTCTT 


TTGGAGGGAC 


CTCCGGGGAC 


18420 


AGGTAAAACT 


TTGCTTGCTA 


AGGCAGTCGC 


TGGAGAAGCA 


GGTGTTCCAT 


TCTTTAGTAT 


18480 


CTCAGGTTCT 


GACTTTGTAG 


AAATGTTTGT 


CGGAGTTGGA 


GCTAGTCGTG 


TTCGCTCTCT 


18540 


TTTTGAGGAT 


GCCAAAAAAG 


CAGCACCAGC 


TATCATCTTT 


ATCGATGAAA 


TTGATGCTGT 


xoouu 


TGGACGTCAA 


CGTGGAGTCG 


GTCTCGGCGG 


AGGTAATGAC 


GAACGTGAAC 


AAACCTTGAA 


AODDU 


CCAACTTTTG 


ATTGAGATGG 


ATGGTTTTGA 


GGGAAATGAA 


GGGATTATCG 


TCATCGCTGC 


1 Q70t\ 
±0 / jSU 


GACAAACCGT 


TCAGATGTAC 


TTGACCCTGC 


CCTTTTGCGT 


CCAGGACGTT 


TTGATAGAAA 


± a t o\J 


AGTATTGGTT 


GGTCGTCCTG 


ATGTTAAAGG 


TCGTGAAGCA 


ATCTTGAAAG 


TTCACGCTAA 




GAATAAGCCT 


TTAGCAGAAG 


ATGTTGATTT 


GAAATTAGTG 


GCTCAACAAA 


CTCCAGGCTT 




TGTTGGTGCT 


GATTTAGAGA 


ATGTCTTGAA 


TGAAGCAGCT 


TTAGTTGCTG 


CTCGTCGCAA 


1070U 


TAAATCGATA 


ATTGATGCTT 


CAGATATTGA 


TGAAGCAGAA 


GATAGAGTTA 


TTGCTGGACC 


19020 


TTCTAAGAAA 


GATAAGACAG 


TTTCACAAAA 


AGAACGAGAA 


TTGGTTGCTT 


ACCATGAGGC 


19060 


AGGACATACC 


ATTGTTGGTC 


TAGTCTTGTC 


GAATGCTCGC 


GTTGTCCATA 


AGGTTACAAT 


19140 


TGTACCACGC 


GGCCGTGCAG 


GCGGATACAT 


GATTGCACTT 


CCTAAAGAGG 


ATCAAATGCT 


19200 


TCTATCTAAA 


GAAGATATGA 


AAGAGCAATT 


GGCTGGCTTA 


ATGGGTGGAC 


GTGTAGCTGA 


19260 


AGAAATTATC 


TTTAATGTCC 


AAACCACAGG 


AGCTTCAAAC 


GACTTTGAAC 


AAGCGACACA 


19320 


AATGGCACGT 


GCAATGGTTA 


CAGAGTACGG 


TATGAGTGAA 


AAACTTGGCC 


CAGTACAATA 


19380 


TGAAGGAAAC 


CATGCTATGC 


TTGGTGCACA 


GAGTCCTCAA 


AAATCAATTT 


CAGAACAAAC 


19440 


AGCTTATGAA ATTGATGAAG AGGTTCGTTC ATTATTAAAT 


GAGGCACGAA ATAAAGCTGC 


19500 


TGAAATTATT 


CAGTCAAATC 


GTGAAACTCA 


CAAGTTAATT 


GCAGAAGCAT 


TATTGAAATA 


19560 


CGAAACATTG 


GATAGTACAC 


AAATTAAAGC 


TCTTTACGAA 


ACAGGAAAGA 


TGCCTGAAGC 


19620 


AGTAGAAGAG 


GAATCTCATG 


CACTATCCTA 


TGATGAAGTA 


AAGTCAAAAA 


TGAATGACGA 


19680 


AAAATAACCC 


TGAGAGAGGC 


TGGAGCCTCT 


CTTTTTTGTG 


CAGTTTAGGA 


GCTAAAGGGA 


19740 


ACAGAATGGA 


GAAAATGGAA 


CAAATGTGTT 


TTCTAATCTG 


TTAGACTGTA 


TCTAGAAAGG 


19800 


GGAAAATTAT 


GATTAAAGAA 


TTGTATGAAG 


AAGTCCAAGG 


GACTGTGTAT 


AAGTGTAGAA 


19860 


ATGAATATTA 


CCTTCATTTA 


TGGGAATTGT 


CGGATTGGGA 


GCAAGAAGGC ATGCTCTGCT 


19920 
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TACATGAATT 


GATTAGTAGA 


GAAGAAGGAC 


TGGTAGACGA 


TATTCCACGT 


TTAAGGAAAT 


19980 


ATTTCAAGAC 


CAAGTTTCGA 


AATCGAATTT 


TAGACTATAT 


CCGTAAACAG 


GAAAGTCAGA 


20040 


AGCGTAGATA 


CGATAAAGAA 


CCCTATGAAG 


AAGTGGGTGA 


GATCAGTCAT 


CGTATAAGTG 


20100 


AGGGGGGTCT 


CTGGCTAGAT 


GATTATTATC 


TCTTTCATGA 


AACACTAAGA 


GATTATAGAA 


20160 


ACAAACAAAG 


TAAAGAGAAA 


CAAGAAGAAC 


TAGAACGCGT 


CTTAAGCAAT 


GAACGATTTC 


20220 


GAGGGCGTCA 


AAGAGTATTA 


AGAGACTTAC 


GCATTGTGTT 


TAAGGAGTTT 


ACTATCCGTA 


20280 


CCCACTAGTA 


AGTCATGCAA 


AAAAAATGAA 


AAAAATTAGA 


AAAAGTAGTT 


GACAAAGTTT 


20340 


GAAAAGGCTG 


TATAATAGTA 


AGAGTTGAAA 


ATAACAACTC 


AGGTCCGTTG 


GTCAAGGGGT 


20400 


TAAGACACCG 


CCTTTTCACG 


GCGGTAACAC 


GGGTTCGAAT 


CCCGTACGGA 


CTATGGTATG 


20460 


TTGCGTCAGG 


ACCACTTGAT 


GAAAAAAAGT 


TTAAAAAAAC 


TTAAAAATCT 


TCAAAAAAGT 


20520 


GTTGACAAGC 


GAAAGCAGTT 


GTGATATACT 


AATATAGTTG 


TCGCTTGAGA 


GAAGCAAGTG 


20580 


ACAAAGACCT 


TTGAAAACTG 


AACAAGACGA 


ACCAATGTGC 


AGGGCGCTAC 


AACGTAAGTT 


20640 


GTAGTACTGA 


ACAATGAAAA 


AAACAATAAA 


TCTGTCAGTG 


ACAGAAATGA 


GTAAGAACTC 


20700 


AAACTTTTTA 


ATGAGAGTTT 


GATCCTGGCT 


CAGGACGAAC 


GCTGGCGGCG 


TGCCTAATAC 


20760 


ATGCAAGTAG 


AACGCTGAAG 


GAGGAGCTTG 


CTTCTCTGGA 


TGAGTTGCGA 


ACGGGTGAGT 


20820 


AACGCGTAGG 


TAACCTGCCT 


GGTAGCGGGG 


GATAACTATT 


GGAAACGATA 


GCTAATACCG 


20880 


CATAAGAGTA 


GATGTTGCAT 


GACATTTGCT 


TAAAAGGTGC 


ACTTGCATCA 


CTACCAGATG 


20940 


GACCTGCGTT 


GTATTAGCTA 


GTTGGTGGGG 


TAACGGCTCA 


CCAAGGCGAC 


GATACATAGC 


21000 


CGACCTGAGA 


GGGTGATCGG 


CCACACTGGG 


ACTGAGACAC 


GGCCCAGACT 


CCTACGGGAG 


21060 


GCAGCAGTAG 


GGAATCTTCG 


GCAATGGACG 


GAAGTCTGAC 


CGAGCAACGC 


CGCGTGAGTG 


21120 


AAGAAGGTTT 


TCGGATCGTA 


AAGCTCTGTT 


GTAAGAGAAG 


AACGAGTGTG 


AGAGTGGAAA 


21180 


GTTCACACTG 


TGACGGTATC 


TTACCAGAAA 


GGGACGGCTA 


ACTACGTGCC 


AGCAGCCGCG 


21240 


GTAATACGTA 


GGTCCCGAGC 


GTTGTCCGGA 


TTTATTGGGC 


GTAAAGCGAG 


CGCAGGCGGT 


21300 


^ T AG AT AAGTC 


TGAAGTTAAA 


GGCTGTGGCT 


TAACCATA 






21338 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6273 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TGTTTTTAAA GAGCCGTGTC TGGATAGACT TTCGGACGCA ACGCTCTATT AGATAATGAA 60 

CTGCCTATAC ACAAGATTTC TAACCTTAGT CGACATGAGC TGAAACCTCT TATTTGTTAA 120 

GTAGTTCACA AAATATTATA CACCTATTTT ATGAATAGTC AACTGTCTTT ACAGTAAAAT 180 

TTTAGAAAAT CATGAAAATT TTCTCTTTCT TTCCATTTTA AGTGACATTC AGTCATTCTC 240 

ACATCAAAAA AGCCCAGACG AAATTGTCTG AGCATTCTTT TATCTAGTCG TTTAAGGAAG 300 

TTGAGTTCAG TATGTTTAAA GTCTCTGTCC CATCATTTCT TCAACAAACC TTGTTCTTGG 360 

AGAAACTCCT TGGCTACTTG CTTTGCTGAC TTGCCTTCAA CACCGACTTG GTAGTTGAGC 420 

TGGCTCATCT GGCTTTCTGT AATCTTACCA GCCAATGTAT TAAGAACTCT TTCCAACTCT 480 

GGGTGTTTCT TGAGAAGAGC TTCTTTCATG AGTGGAGCCC CTTGATAAGG TGGGAAGAGT 540 

TGCTTGTCAT CTTCCAAGAC CTGTAAATCA TAACGCTCCA ATTCCGCATC AGTCGAATAG 600 

GCATCCGTGA TTTGAATATC CCCTGACTGA ATAGCCTGAT AGCGAAGGGC TGGCTCAATG 660 

GTCGCTACAT TGAGATTGAG ACCATACATT GATTGCAAGC CCTTATTTCC ATCTTCACGG 720 

TCGTTAAACT CGAGTGTAAA ACCTGCCTTC AACTGCCCTT CCACTTTTTT CAAGTCTGAA 780 

ATGGTCTTCA AGCCATATTC TTGAGCAATC TTTTTCGGAA CAGCTACAGC ATAGGTGTTT 840 

TGATAAGACA TGGGTTTGAG ATAGGCTAGA TGATCCTGCT TAGCAATGCC ATCACGCGCC 900 

ACCTGATAAA CCTGTTCTGG TTCATGACTC ACCTTGGGTG ATGGTTGAAG CAAACTTTCA 960 

GTCACCGTAC CAGTAAATTC AGGATAGATG TCAATATCGC CTTTTTTCAG AGCTTCATAA 1020 

AGGAAGCTTG TCTTCCCAAA ATTCGGTTTA ACAGTCGCAG TCATGCTGGT ATTTTCTTCA 1080 

ATCAGCAACT TATACATATT GGCCAAAATT TCTGGTTCTG GACCTATTTT CCCAGCAATA 1140 

ACCAAGTTTT CCTTCTCTTT TTGAACCAAA AGAGCTGGAC TATAAGACAG ACCCAGTAAT 1200 

AAAGCCACCA AGGCAAAACC TGAGAAAATC GTCCGTAATT TTGCTTTTTC CATCACTTTT 1260 

AGTAGGAAGT TAAAGGCAAT GGCTAGCACT GCAGAAGAAA GTGCCCCAAT CAAAATCAAA 1320 

„ CTGGCATTAT . TACGGTCAAT TCCCAAAAGA . ATAAAGGAAC CTAGTCCCCC TGCACCAATC 13 8.0 

AAGGCCGCCA AGGTTGCCGT ACCGATAATC AAAACAGCTG CCGTCCGAAT CCCAGACATG 1440 

ATAACAGGCA TGGCGAGTGG AATTTCAAAT TTCTTGAGAC GTTCCCATCT GGTCATCCCA 1500 

AAGGCAATCC CAGCCTCTTG CAGGTTCGGA TCAATTCCCT TCAGCCCAGT GATAGTATTT 1560 

TGCAAAATAG GGAAAATCGC ATAAATCACT AGAGCTGTCA AAGCCGGCAA GGTCCCAATT 1620 

CCCATCAAAG GGATAAAGAG CCCCAACAAG GCCAGAGACG GGATGGTCTG GAAAATACCT 1680 

GCAATCTGCA AGACCCAGTC GGCCAGCTTC TCATGATAGC GAAGAAAAAC AGCCAAGGGA 1740 
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ATCGCAAGCA AAATAGCTAG TAACAAGGTC AAAAGCGACA ACTGCAAATG TTGAGATAGA 1800 

GCTGTCAACC AATCACTAAA ACGATCCTGA AAAGTTGCAA TTAAATTAGT CATGAACACT I860 

ACCTCCAAAC AAGTCTGCTA CAAAGTCTGT TGCAGGCGCT TTTAAAATTG TCTCGGGATT 1920 

CGCTACCTGG CGAATTTCTC CATCCTGCAA GACAGCAATA CGGTCCGCCA ACTTCAAGGC 1980 

TTCATCCGTA TCATGGGTTA CAAAAATCGT TGT CATC CCA AACTCTTTAT GCAATTCTTT 2040 

TGTCAGAACC TGCAACTGTT TTCTCGAAAT AGCATCCAAG GCCGAAAAGG GTTCATCCAT 2100 

GAGGAAAATC TTGGGCTGAC CAATCATAGC TCGGACAATA CCGACCCGTT GCTGTTCTCC 2160 

ACCAGATAAT TCACTAGGTA AGCGATGCCC ATACTCGGCT ACTGGTAAAC CAACCTTAGC 2220 

CAAAAGCTCT TCTGTTTTCT TCGTAATTTC TTCCTTGCTC CACCCCTTCA TTTCAGGAAT 2280 

GAGAGCAATA TTTTCCGCAA CTGTTAGATT TGGAAAAAGA GCAATAGCCT GTAAAACATA 2340 

ACCAGTAGAA AGACGAAGTT CACGCTCATC ATAGTCTTTG ATGCGCTTCC CATCCATATA 2400 

AATATTTCCA TCAGTTGGTT CCAAAAGACG GTTAATC ATC TTGAGCATGG TCGTCTTACC 2460 

TGACCCAGAA GGCCCTACTA AAACCATAAA TTCCCCATCC TCAATCTGTA AGTTGACATC 2520 

TCTCAAGACA TCCTTTTCTG TGTAGCGCAG TGCTACATTT TTGTATTCAA TCATTCTTTG 2580 

TCCTCAATTT AAAACTTCCC TCGATTGGTC AAGTCTTCTA CCTTAGGCAT AACTTCCTTA 2640 

TTATCCCAAT GCTCCACAAT TTTCCCGTTC TCTAAACGGA AGATATCGTA CTGGGCATAA 2700 

GCAACGCCAT CAATCTGAGT CTGACCATAG CTAACCACAT AGTTTCCTTG TCCTAAGAGT 2760 

TGGAAAACAA AGTCAAAAGT GACACTATAT TCAGCCACAT AGTTTTTATA AGCAGCACTT 2820 

CCTTGTCCAA TATCATGATT ATGCTGAATC AAATCGTCTG CCACATAATC ACTCCACTGC 2880 

TCTAGCTCCC CATTTTGGAA AATTTCTGTC AAGAAACGGC GAACCAGCTT TTTATTTTCT 2940 

GCTTTCTTAT CCAAATCCTT GATTTCAAAA TCTCCAAAAA TTTGATCTAG TTGGTCATTT 3000 

TCAGGTGTTC GATAGTAGTC AATGACATCC CAATGCTCAA CAATACAACC ATTCTCATCC 3060 

TCACGGAAAG TATCCGTCGT CACCCATTGA GCTTCTCCAC CATTCAGATA TTGATGAACA 3120 

TGAACAAAGA CCAGATTGCC ATCCTCAATG GTGCGGACAA TCTTAATCTG ACGCTCTGGA 3180 

TGACGCTCAA AGAAATCTGC AAAGAAGGCT GCAAATCCTT CTTTCCCGTC AGGAACACCT 3240 

GTCGAATGTT GGATATAGGT ATCCCCTACA GACTGGGCTT GAGCCTCAGC AACTCGTCCG 3300 

TCTTGAATGG CATGGATGTA TAGGTTGTGA GCATTTTTCA CTTGTTGTGA CATATTCTAA 3360 

ACCTCATTTC CCTTCTCTTT CAGATTCGCC AAAATTCTTT CTTGAAAACC TTCAAATTGG 3420 

TGAATTTCTT CCTCTGAAAA TCCTTTGTAA AAGATAGTAT CCAATTTCTG ACTGACACGA 3480 
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TGCCCCACTT CTTTCTGGGA CTTGCCTAAC TCCGTTAAAA CTAAATACTT CTTACGCTTG 3540 

TCTTTTCCAC ACGGACTAAC AATTACAAGC TTTTGTTCCT CTAGCTTTTT TATCATAGTC 3600 

GTCAGCGTAT TATTCGCAAG TCCAGTCGCA AGCGCGATAT CTGTCGCAGT TGCGCAGCCA 3660 

GTTTCACTAT TCCATAAAAC CGCTAAAATC TTGCCCTGTT CACCCCTATA AAGAGCCTCA 3720 

GGATCTTGAC TCAGTAACTT TTGAAAAATC CGCCCATTCA ACAAACGAAT ATGATGGGCT 3780 

AGCAAATGAC CATCTTTCAT AACACCTCCA ATTTATTTCG ATATCGAAAT GAATAAAACA 3840 

ATTGTAACAC TCATCGTTCT AACTGTCAAC TATTTCGATT TAGAAATAAT TTTTGATAAT 3900 

TATCCACACC ACCATACTCC GGCTCAACTA ACTTTTAACG AGAGTTTCTA AACTCCTTCG 3960 

TCCTCCAGTC TACAAAAGCC TTCCATTCGT ACTATCCTAT ATTTTATGAG GGGACACATT 4020 

TTTCCTATCA GACCATTTAT TTTAAAGATA GAAGTAAATC ATAATTGCTT CCATCTGTTC 4080 

TTTTATAGTA TATTGAAGTT AGACTAGAGC ACTGTATCTT CTAAAACATT GATAGAAAGC 4140 

GATTTGAATT TCCCAATCAA TTTGTTCGTA TTTATAGCAT TTCGAAACTG GAATAGGACA 4200 

CCATGACTGC TAAAAGATTT CTATAAATTC ATTTAATTTC CTCAATCAAT TTGTTCATAT 4260 

CTTATTTCAT TCCGCTATAA TTTCACCTTA CCCTATCTTT TTCGTAGCAC CCTTCAAACA 4320 

GCCTATCCCC TACCGTTTGA CGATTCCTCA CTTCGCTCCA CTTCCATTAC AGAAGTTTCT 4380 

TCACTACTAT GGGCTCGGCT GACTTCTCAT GATTCCTTGT TACT ACT ATT TGAACGCTCA 4440 

CGAGATAGAT CTTACAAAAA ATGCTTTGAT CCACAATGGA ATCAAAGCAT TTTAAAGAGT 4500 

TCCTCATACA TAAGCGCAGA AGTCGCAGTT CCTCTGTACT TGGCTTCTTC TCTTTTGACA 4560 

AAGCGAGCCA AGTTGAGCAA CTCAGGTGCT GGATGTTTGG GATTTAGGAG CAATTCACGA 4620 

TTGACCAGGC CTGAGAGACG AACTGCCTGC AATTGCTCAT TTGTAGTAGG CAGTTTTTTA 4680 

GTAGTCTCTA GGAGAGCAGC AACTAAATCT TCACTCAAAT CATGTCGAGC ATGATTGTAA 4740 

AGATCTTTTA TAAGGCTTTC TAGGTTTGGT TCTACCATCC CTACCACCTC CCTTATGGTT 4800 

TAATAATGTT TAATCAAATC AACCGTTGAA CGATCCAATT TCTTCACCAA GGCTTGTAAG 4860 

AAAGCTTGCG CTTCTAGGAA GTCATCCATT GCATAGAGGG TTTGGTGAGA ATGGATATAA 4920 

CGAGCGCAGA CACCGATAGT TGTTGATGGG ACACCACCAT TTTTCAGATG AGCTGCACCT 4980 

GCATCTGTTC CGCCTTTACC ACAGTAGTAT TGGTACTTGA TACCAGCTTC TTCAGCCGTT 5040 

GTCAAAAGGA AATCCTTCAT CCCTGGGAGA AGCAAGTGAC CTGGATCATA GAAACGAATC 5100 

AAGGTTCCAT CTCCAATCTT GCCTTGACCA CCGTAGACAT CACCTGCTGG TGAGCAATCA 5160 

ACTGCGAGGA AGACTTCTGG GTCAAACTTG GTTGTAGAGG TATGAGCGCC ACGCAGACCA 5220 

ACTTCTTCTT GGACGTTAGA ACCCAGATAG AGTTCATTGC CGAGTTTTTG ACCCGATAAA 5280 
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GCTTCAGCTA GCTCGCTTAC CATGAGGACA CCGTAGCGGT TATCCCAAGC TTTTGAGATG 5340 

ATATTTTTTT CATTGGCTGT CAAAATTGCA GAACTATCTG GTACAATGGT ATCACCAGGA 5400 

CGGATGCCAA AACTTTCTGC CTCAGCCTTG TCCGCAAAAC CACCATCAAA AACGATATCG 5460 

GCAATGGCTG GCATGGTTGG TCCCCCCTTT CCACGAGTCA AATGCGGAGG AACAGAACCT 5520 

GAAATCACAG GAATTTCATG ACCATCACGA GTCAAGAGTT TGAAACGTTG GCTGCTAACC 5580 

ACCATGGGGT TCCAGCCACC GATTTCTACG ACACGGAAGG TACCATCTGG CTTGATTTCG 5640 

CTGACCATAA AACCAACTTC GTCCATATGA GAAGCGACCA AGACGCGCGG TGCATCCACA 5700 

GCTTCTGAAT GTTTGATACC AAAAATACCA CCCAAGCCAT CTGTCACCAC TTCATCCACA 5760 

TGCGGTGTCA ACTTTTCACG AAGATAAGCA CGGACAGGCG CTTCATGACC TGAGACTGCA 5820 

GCAAGTTCTG TTACTTCTTT AATTTTTGAA AATAATGTTG TCATTTCAGT TCCTTCTTTC 5880 

TTTCATCCAT TTTACCACTT TTTATAGGAG AAGGATAGTG GGAAGGTGGA TTTCTAAGTT 5940 

AGTATCTTAG TCCTGCTCTA TCTTAGAAAA GGATAGTATT CTCTTGCATG TAGTGCAAAA 6000 

TCTAGTAAAC ATTCCAAAAT TAACTCGAAT ATTTATTTCC AAACAAAAAA ACAATACACC 6060 

ATCAAAGTTG TTTGGATTTT TCATGAAATT TACAGAAAAT AGTTGACTTC CCTTTCTTCT 6120 

TTCTTTAAAT ATATAGTTGG TTGAGTTTGG AATAGTACGC TGTAGCTGCT AAAACATTTC 6180 

TAGAAATTAA TTTGACTTTC CTAATAGAGT TGTTCATATC TTATTTCAAT TTACTATAGT 6240 

ACAAAACTAG AAAAGGAAAA AATCATGACC AGG 6273 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 28171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

ACAACCTTTT TCAAAAACTC ACCTTGGTAC GGAGATGTTT TGCTTTCTGC TATTATTTTC 60 

GGTTATATTC ATATCAATTT TGCTTTAACT CCTCTTGCTT TTTTCATTTA TGCTAGTGGA 120 

GGTCTTATTT TAGCTCTATT GTATCGCATG ACTAAAAATC TCTACTATCC AATACTAGTT 180 

CATATTCTCA TTAATATCAC TGCCTTCTGG GATGTGTGGT TGCTCCTATT TTCAGGAAGT 240 

TAGCTTACTA AAATAATGTC GGAACTTTCC GGCATTTTCT TTTTTCACAA ATAGTCAACG 300 

TTTTTCTTTT CGATATTGTA GTGGTGTGTA TCCAGTTATT TTTTTGAATT GATTTTGAAA 360 
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ATAAGGTTGA CTTGAGAAAG GCAGATAGTG AAGATAGTTA AGAAGAATAG GATGTTCTTT 420 

TTTCCTTTTT GGAAAACTTC TAAAATATGG TATAATGAAA AGATAAAGAA GTTGGGGGTA 480 

GAAGATGAAC ATTCAACAAT TACGCTATGT TGTGGCTATT GCCAATAGTG GTACTTTTCG 540 

TGAAGCTGCT GAAAAGATGT ATGTTAGTCA GCCGAGTCTG TCTATTTCTG TTCGTGATTT . 600 

GGAAAAAGAG TTGGGCTTTA AGATTTTCCG TCGGACCAGC TCAGGGACTT TCTTGACCCG 660 

TCGTGGGATG GAATTTTATG AAAAATCGCA AGAATTGGTT AAAGGATTTG ATATTTTTCA 720 

AAATCAGTAT GCCAATCCTG AAGAAGAAAA AGATGAATTT TCTGTTGCTA GCCAGCACTA 780 

TGACTTCTTG CCACCAACTA TTACGGCCTT TTCAGAGCGC TATCCTGACT ATAAGAACTT 840 

CCGTATTTTT GAATCAACTA CTGTTCAAAT ATTAGATGAA GTGGCGCAAG GGCATAGTGA 900 

GATTGGGATT ATCTACCTCA ACAATCAAAA TAAAAAGGGG ATTATGCAAC GGGTTGAAAA 960 

ATTAGGTCTG GAGGTCATCG AATTGATTCC TTTCCATACC CATATTTATC TCCGTGAGGG 1020 

TCATCCTTTA GCCCAGAAAG AGGAATTAGT CATGGAGGAT TTAGCGGATT TACCAACGGT 1080 

TCGTTTCACT CAAGAGAAAG ACGAGTACCT TTATTATTCA CAGAACTTTG TCGATACCAG 1140 

CGCTAGCTCA CAGATGTTTA ATGTGACAGA CCGTGCCACC TTGAATGGTA TTTTGGAGCG 1200 

GACGGACGCC TATGCGACAG GTTCTGGATT TTTAGATAGT GACAGTGTTA ATGGCATTAC 1260 

AGTTATTCGT CTCAAGGATA ACCTAGATAA CCGCATGGTC TATGTTAAAC GTGAAGAAGT 1320 

GGAGCTTAGT CAAGCTGGGA CTCTCTTCGT AGAAGTCATG CAAGAATATT TTGATCAAAA 1380 

GAGGAAATCA TGAAAAAAAG AGCAATAGTG GCAGTCATTG TACTGCTTTT GATTGGGCTG 1440 

GATCAGTTGG TCAAATCCTA TATCGTCCAG CAGATTCCAC TGGGTGAAGT GCGCTCCTGG 1500 

ATCCCCAATT TCGTTAGCTT GACCTACCTG CAAAATCGAG GTGCAGCCTT TTCTATCTTA 1560 

CAAGATCAGC AGCTGTTATT CGCTGTCATT ACTCTGGTTG TCGTGATAGG TGCCATTTGG 1620 

TATTTACATA AACACATGGA GGACTCATTC TGGATGGTCT TGGGTTTGAC TCTAATAATC 1680 

GCGGGTGGTC TTGGAAACTT TATTGACAGG <5TCAGTCAGG GCTTTGTTGT GGATATGTTC 1740 

CACCTTGACT TTATCAACTT TGCAATTTTC AATGTGGCAG ATAGCTATCT GACGGTTGGA 1800 

GTGATTATTT TATTGATTGC AATGCTAAAA GAGGAAATAA ATGGAAATTA AAATTGAAAC 1860 

TGGTGGTCTG CGTTTGGATA AGGCTTTGTC AGATTTGTCA GAATTATCAC GTAGTCTCGC 1920 

GAATGAACAA ATTAAATCAG GCCAGGTCTT GGTCAATGGT CAAGTCAAGA AAGCTAAATA 1980 

CACAGTCCAA GAGGGTGATG TCGTCACTTA CCATGTGCCA GAACCAGAGG TATTAGAGTA 2040 

TGTGGCTGAG GATCTTCCGC TAGAAATAGT CTACCAAGAT GAGGATGTGG CTGTCGTTAA 2100 

CAAACCTCAG GGAATGGTTG TGCACCCGAG TGCTGGTCAT ACCAGTGGAA CCCTAGTAAA 2160 
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TGCCCTCATG TATCATATTA AGGACTTGTC GGGTATCAAT GGGGTTCTGC GTCCAGGGAT 2220 

TGTTCACCGT ATTGATAAGG ATACGTCAGG TCTTCTCATG ATTGCTAAAA ACGATGATGC 2280 

GCATCTAGCA CTTGCCCAAG AACTCAAGGA TAAAAAGTCT CTCCGCAAAT ATTGGGCGAT 2340 

TGTTCATGGA AATCTACCTA ATGATCGTGG TGTAATTGAA GCGCCGATTG GCCGGAGTGA 2400 

AAAAGACCGT AAGAAACAGG CTGTAACTGC TAAAGGGAAG CCTGCAGTGA CGCGTTTTCA 2460 

CGTCTTGGAA CGCTTTGGCG ATTATAGCTT AGTAGAGTTG CAACTGGAGA CAGGGCGCAC 2520 

TCATCAAATC OGTGTCCACA TGGCTTATAT CGGCCATCCA GTCGCTGGTG ATGAGGTCTA 2580 

TGGTCCTCGC AAGACTTTGA AAGGACATGG ACAATTTCTT CATGCCAAGA CTTTAGGTTT 2640 

TACTCATCCG AGAACAGGTA AGACCTTGGA ATTTAAAGCA GATATCCCAG AGATTTTTAA 2700 

GGAAACCTTG GAGAGATTGA GAAAGTAAGA ATGAAAAAGA AATTAACTAG TTTAGCACTT 2760 

GTAGGCGCTT TTTTAGGTTT GTCATGGTAT GGGAATGTTC AGGCTCAAGA AAGTTCAGGA 2820 

AATAAAATCC ACTTTATCAA TGTTCAAGAA GGTGGCAGTG ATGCGATTAT TCTTGAAAGC 2880 

AATGGACATT TTGCCATGGT GGATACAGGA GAAGATTATG ATTTCCCAGA TGGAAGTGAT 2940 

TCTCGCTATC CATGGAGAGA AGGAATTGAA ACGTCTTATA AGCATGTTCT AACAGACCGT 3000 

GTCTTTCGTC GTTTGAAGGA ATTGGGTGTC CAAAAACTTG ATTTTATTTT GGTGACCCAT 3060 

ACCCACAGTG ATCATATTGG AAATGTTGAT GAATTACTGT CTACCTATCC AGTTGACCGA 3120 

GTCTATCTTA AGAAATATAG TGATAGTCGT ATTACTAATT CTGAACGTCT ATGGGATAAT 3180 

CTGTATGGCT ATGATAAGGT TTTACAGACT GCTGCAGAAA AAGGTGTTTC AGTTATTCAA 3240 

AATATCACAC AAGGGGATGC TCATTTTCAG TTTGGGGACA TGGATATTCA GCTCTATAAT 3300 

TATGAAAATG AAACTGATTC ATCGGGTGAA TTAAAGAAAA TTTGGGATGA CAATTCCAAT 3360 

TCCTTGATTA GCGTGGTGAA AGTCAATGGC AAGAAAATTT ACCTTGGGGG CGATTTAGAT 3420 

AATGTTCATG GAGCAGAAGA CAAGTATGGT CCTCTCATTG GAAAAGTTGA TTTGATGAAG 3480 

TTTAATCATC ACCATGATAC CAACAAATCA AATACCAAGG ATTTCATTAA AAATTTGAGT 3540 

CCGAGTTTGA TTGTTCAAAC TTCGGATAGT CTACCTTGGA AAAATGGTGT TGATAGTGAG 3600 

TATGTTAATT GGCTCAAAGA ACGAGGAATT GAGAGAATCA ACGCAGCCAG CAAAGACTAT 3660 

GATGCAACAG TTTTTGATAT TCGAAAAGAC GGTTTTGTCA ATATTTCAAC ATCCTACAAG 3720 

CCGATTCCAA GTTTTCAAGC TGGTTGGCAT AAGAGTGCAT ATGGGAACTG GTGGTATCAA 3780 

GCGCCTGATT CTACAGGAGA GTATGCTGTC GGTTGGAATG AAATCGAAGG TGAATGGTAT 3840 

TACTTTAACC AAACGGGTAT CTTGTTACAG AATCAATGGA AAAAATGGAA CAATCATTGG 3900 
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TTCTATTTGA 


CAGACTCTGG 


TGCTTCTGCT 


AAAAATTGGA 


AGAAAATCGC 


TGGAATCTGG 


3960 


TATTATTTTA ACAAAGAAAA 


CCAGATGGAA 


ATTGGTTGGA 


TTCAAGATAA 


AGAGCAGTGG 


4020 


TATTATTTGG 


ATGTTGATGG 


TTCTATGAAG 


ACAGGATGGC 


TTCAATATAT 


GGGGCAATGG 


4080 


TATTACTTTG 


CTCCATCAGG 


GGAAATGAAA 


ATGGGCTGGG 


TAAAAGATAA 


AGAAACCTGG 


4140 


TACTATATGG 


ATTCTACTGG 


TGTCATGAAG 


ACAGGTGAGA 


TAGAAGTTGC 


TGGTCAACAT 


4200 


TATTATCTGG 


AAGATTCAGG 


AGCTATGAAG 


CAAGGCTGGC 


ATAAAAAGGC 


AAATGATTGG 


4260 


TATTTCTACA AGACAGACGG 


TTCACGAGCT 


GTGGGTTGGA 


TCAAGGACAA 


GGATAAATGG 


4320 


TACTTCTTGA 


AAGAAAATGG 


TCAATTACTT 


GTGAACGGTA 


AGACACCAGA 


AGGTTATACT 


4380 


GTGGATTCAA 


GTGGTGCCTG 


GTTAGTGGAT 


GTTTCGATCG 


AGAAATCTGC 


TACAATTAAA 


4440 


ACTACAAGTC 


ATTCAGAAAT 


AAAAGAATCC 


AAAGAAGTAG 


TGAAAAAGGA 


TCTTGAAAAT 


4500 


AAAGAAACGA 


GTCAACATGA 


AAGTGTTACA 


AATTTTTCAA 


CTAGTCAAGA 


TTTGACATCC 


4560 


TCAACTTCAC 


AAAGCTCTGA 


AACGAGTGTA 


AACAAATCGG 


AATCAGAACA 


GTAGTAGAAA 


4620 


AGAAGGTTTT 


AGGGCCTTCT 


TTTTCCTATC 


AACTCTTTTC 


TATTTCCTGT 


TATTCATGTT 


4680 


ATAATGGATA AATATGAATA ATCGGAGTGA GACTATGAAA 


TACAAACGGA 


TTGTCTTTAA 


4740 


GGTGGGTACT TCTTCTCTGA CAAATGAGGA TGGAAGTTTA 


TCACGTAGTA 


AGGTAAAGGA 


4600 


TATTACCCAG 


CAGTTGGCTA 


TGCTGCACGA 


GGCTGGTCAT 


GAGTTGATTT 


TGGTGTCTTC 


4860 


AGGTGCCATT 


GCGGCTGGTT 


TTGGAGCCTT 


AGGATTTAAA 


AAGCGTCCGA 


CTAAGATTGC 


4920 


TGATAAACAG 


GCTTCAGCAG 


CGGTAGGGCA 


GGGGCTTTTG 


TTGGAAGAAT 


ATACAACCAA 


4980 


TCTTCTCTTG 


CGTCAAATCG 


TTTCTGCACA 


AATCTTGCTG 


ACCCAAGATG 


ACTTTGTGGA 


5040 


TAAGCGTCGT 


TATAAAAATG 


CCCATCAGGC 


TTTGTCGGTT 


TTGCTCAACC 


GTGGGGCAAT 


5100 


TCCTATCATC AATGAGAATG ATAGTGTCGT 


TATTGATGAG 


CTCAAGGTTG 


GGGACAATGA 


5160 


CACTCTAAGT 


GCTCAAGTAG 


CGGCGATGGT 


CCAAGCAGAC 


CTTTTAGTTT 




5220 


TGTGGACGGT 


CTCTATACTG 


GAAATCCTAA 


TTCAGATCCA 


AGAGCCAAAC 


GCTTGGAGAG 


5280 


AATCGAGACC 


ATCAATCGTG 


AGATTATTGA 


TATGGCTGGT 


GGAGCTGGTT 


CGTCAAACGG 


5340 


AACTGGGGGT 


ATGTTAACCA 


AAATCAAGGC 


TGCAACTATC 


GCGACGGAAT 


CAGGAGTTCC 


5400 


TGTTTATATC 


TGCTCATCCT 


TGAAATCAGA 


TTCCATGATT 


GAGGCGGCAG 


AGGAGACCGA 


5460 


GGATGGTTCT 


TACTTTGTTG 


CTCAAGAGAA 


GGGGCTTCGT 


ACCCAGAAAC 


AATGGCTTGC 


5520 


CTTCTATGCT 


CAGAGTCAAG 


GTTCTATTTG 


GGTTGATAAA 


GGGGCTGCGG 


AAGCTCTCTC 


5580 


TCAATATGGA 


AAGAGTCTTC 


TCTTATCTGG 


TATCGTTGAA 


GCAGAAGGAG 


TCTTTTCTTA 


5640 


CGGTGATATC 


GTGACAGTAT 


TTGACAAGGA 


AAGTGGAAAA 


TCACTTGGAA 


AAGGACGCGT 


5700 
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GCAATTTGGA GCATCTGCTT TGGAGGATAT GTTGCGTTCT CAAAAAGCCA AGGGTGTCTT 5760 

GATTTACCGT GACGACTGGA TTTCCATTAC TCCTGAAATC CAACTACTTT TTACAGAATT 5820 

TTAGAGGTAA ACTATGGTGA GTAGACAAGA ACAATTTGAA CAGGTACAGG CTGTTAAAAA 5880 

ATCGATTAAC ACAGCTAGTG AAGAAGTGAA AAACCAAGCC TTGCTAGCCA TGGCTGATCA 5940 

CTTAGTGGCT GCTACTGAGG AAATTTTAGC GGCTAATGCC CTCGATATGG CAGCGGCTAA 6000 

GGGGAAAATC TCAGATGTGA TGTTGGATCG TCTTTATTTG GATGCAGATC GTATAGAAGC 6060 

GATGGCAAGA GGAATTCGTG AAGTGGTTGC CTTACCAGAT CCAATCGGTG AAGTTTTAGA 6120 

AACAAGTCAG CTTGAAAATG GTTTGGTTAT CACAAAAAAA CGTGTAGCTA TGGGTGTCAT 6180 

CGGTATTATC TATGAAAGCC GTCCAAATGT GACGTCTGAT GCGGCTGCTT TGACTCTTAA 6240 

GAGTGGAAAT GCGGTTGTTC TTCGTAGTGG TAAGGATGCC TATCAAACAA CCCATGCCAT 6300 
TGTCACAGCC TTGAAGAAGG GCTTGGAGAC GACTACTATT CATCCAAATG TGATTCAACT ' 6360 

GGTGGAGGAT ACTAGCCGTG AAAGTAGTTA TGCT ATGATG AAGGCCAAGG GCTATCTAGA 6420 

CCTTCTCATT CCTCGTGGAG GAGCTGGCTT GATCAATGCA GTGGTTGAGA ATGCGATTGT 6480 

ACCTGTTATC GAGACAGGGA CTGGGATTGT CCATGTCTAT GTGGATAAGG ATGCAGACGA 6540 

AGACAAGGCG CTGTCTATCA TCAACAATGC TAAAACCAGT CGTCCTTCTG TTTGTAATGC 6600 

CATGGAGGTT CTGCTGGTTC ATGAAAACAA GGCAGCAAGC TTCCTTCCTC GCTTGGAGCA 6660 

AGTGTTGGTT GCAGAGCGTA AGGAAGCTGG ACTGGAACCA ATTCAATTCC GCCTAGATAG 6720 

CAAAGCAAGC CAGTTTGTTT CAGGTCAAGC AGCTGAGACC CAAGACTTTG ACACCGAGTT 6780 

TTTAGACTAT GTCCTTGCTG TTAAGGTTGT GAGCAGTTTA GAAGAAGCGG TTGCGCACAT 6840 

TGAATCCCAC AGCACCCATC ATTCGGATGC TATTGTGACG GAAAATGCTG AAGCTGCAGC 6900 

ATACTTTACA GATCAAGTGG ACTCTGCAGC GGTGTATGTT AATGCCTCAA CTCGTTTCAC 6960 

AGATGGAGGA CAATTTGGTC TTGGTTGTGA AATGGGGATT TCTACTCAGA AATTGCACGC 7020 

GCGTGGTCCC ATGGGCTTGA AAGAGTTGAC CAGCTACAAG TATGTGGTTG CCGGTGATGG 7080 

GCAGATAAGG GAGTAAGAGA TGAAGATTGG ATTTATCGGT TTGGGGAATA TGGGTGCTAG 7140 

CTTGGCAAAA TCTGTCTTGC AGACTAGGAC GTCAGATGAG ATTCTCCTTG CCAATCGTAG 7200 

TCAAGCTAAG GTAGATGCTT TCATTGCAGA CTTTGGTGGT CAGGCTTCCA GCAATGAAGA 7260 

AATGTTTGCA GAAGCAGATG TGATTTTTCT AGGAGTTAAG CCTGCTCAGT TTTCTGAACT 7320 

GCTTTCTCAA TACCAGACCA TCCTTGAAAA AAGAGAAAGT CTTCTTTTGA TTTCGATGGC 7380 

AGCTGGATTG ACCTTAGAAA AACTAGCAAG TCTTATCCCA AGTCAACACC GAATTATTCG 7440 
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TATGATGCCT AATACCCCTG CTTCTATCGG GCAAGGAGTG ATTAGTTATG CCTTGTCTCC 7500 

TAATTGCAGG GCTGAGGACA GTGAGCTCTT TTATCAGCTT TTAGCGAAGG CTGGTCTCTT 7560 

GGTTGAACTA GGAGAAAGTT TAATCGATGC AGCGACAGGT CTTGCAGGTT GTGGACCAGC 7620 

CTTTGTCTAT CTTTTTATCG AGGCCTTGGC AGATGCAGGT GTTCAGACAG GATTACCACG 7680 

AGAAATAGCA TTGAAAATGG CAGCACAAAC TGTGGTAGGA GCTGGGCAAT TGGTCCTTGA 7740 

AAGTCAGCAA CATCCTGGAG TATTGAAAGA CCAAGTCTGT AGCCCAGGCG GTTCGACTAT 7800 

CGCTGGTGTA GCAAGCCTAG AAGCGCATGC TTTCCGAGGA ACAGTCATGG ATGCAGTTCA 7860 

TCAAGCCTAC AAACGAACAC AAGAACTAGG TAAATAAGAG GTAGTTTTGA CTGCCTCTTT 7920 

TATGGTGGCT GAAATGAGAA GACACAAAAA GATTGTCACA AACCCCTATT TTTTTGATAG 7980 

AATAGAAGTA GTAAAAAAGA AATGAGTTAG ACATGTCAAA AGGATTTTTA GTCTCTCTTG 8040 

AGGGACCAGA GGGAGCAGGC AAGACCAGTG TTTTAGAGGC TCTGCTACCA ATTTTAGAGG 8100 

AAAAAGGAGT AGAGGTGTTG ACGACCCGTG AACCTGGCGG AGTCTTGATT GGGGAGAAGA 8160 

TTCGGGAAGT GATTTTGGAT CCAAGTCATA CTCAGATGGA TGCTAAAACA GAGCTACTTC 8220 

TCTATATTGC CAGTCGCAGA CAGCATTTGG TGGAAAAAGT TCTTCCAGCC CTTGAAGCTG 8280 

GCAAGTTGGT CATCATGGAT CGTTTTATCG ATAGTTCTGT TGCCTATCAG GGATTTGGTC 8340 

GTGGCTTAGA TATTGAAGCC ATTGACTGGC TCAATCAGTT TGCGACAGAT GGCCTCAAAC 8400 

CCGATTTGAC ACTCTATTTT GACATCGAGG TGGAAGAAGG GCTGGCTCGT ATTGCTGCTA 8460 

ATAGTGACCG CGAGGTTAAT CGTTTGGATT TGGAAGGGTT GGACTTGCAT AAAAAAGTTC 8520 

GTCAAGGCTA CCTTTCTCTT CTGGATAAAG AGGGAAATCG CATTGTCAAG ATTGATGCTA 8580 

GTCTCCCTTT GGAGCAAGTT GTGGAAACTA CCAAGGCTGT CTTGTTTGAC GGAATGGGCT 8640 

TGGCCAAATG AAACAAGATC AACTAAAGGC 1TGGCAACCA GCTCAGTTTG ACCGTTTTGT 8700 

CCGTATCTTA GAACAAGACC AGCTCAATCA CGCCTATCTC TTTTCAGGTT TCTTTGAAAG 8760 

CTTGGAAATG GCGCAATTTT TAGCTAAGAG CCTCTTTTGT ACGGATAAAG TTGGCGTCTT 8820 

ACCATGTGAG AAATGCCGAA GTTGCAAGCT GATTGAACAG GGAGAATTTC CCGATGTCAC 8880 

CTTGATTAAA CCAGTTAATC AGGTCATTAA GACGGAACGC ATTCGAGAAT TGGTGGGTCA 8940 

GTTTTCTCAA GCAGGGATTG AAAGCCAGCA ACAGGTCTTT ATCATCGAGC AAGCGGATAA 9000 

AATGCATCCC AACGCAGCCA ATTCTCTGCT CAAGGTCATC GAAGAACCCC AGAGTGAAGT 9060 

TTATATTTTC TTCTTGACTA GCGATGAGGA AAAGATGTTA CCGACAATCC GAAGTCGGAC 9120 

TCAGATCTTC CACTTTAAAA AGCAAGAAGA AAAACTTATC TTACTCTTAG AACAAATGGG 9180 

ACTTGTTAAG AAAAAAGCGA CTCTTTTAGC TAAGTTTAGT CAATCGCGAG CTGAAGCAGA 9240 
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AAAGTTGGCT AATCAGGCAA GTTTTTGGAC CTTGGTCGAT GAAAGTGAAC GCCTGCTGAC 9300 

TTGGTTAGTA GCTAAGAAAA AAGAAAGTTA TCTACAGGTT GCCAAATTAG CCAACTTGGC 9360 

AGATGATAAG GAAAAACAGG ATCAGGTTTT ACGGATTCTT GAAGTTCTCT GTGGGCAGGA 9420 

CCTCTTGCAG GTAAGAGTAA GAGTGATTCT ACAAGATTTA CTAGAAGCTA GAAAAATGTG 9480 

GCAAGCTAAT GTCAGCTTTC AAAATGCCAT GGAATATCTG GTCTTGAAAG AAATATAAAC 9540 

TCAAAAATGA ATGATAAAGA AAGGAAAGGG CTGTTTTATG GACAAAAAAG AATTATTTGA 9600 

CGCGCTGGAT GATTTTTCCC AACAATTATT GGTAACCTTA GCCGATGTGG AAGCCATCAA 9660 

GAAAAATCTC AAGAGCCTGG TAGAGGAAAA TACAGCTCTT CGCTTGGAAA ATAGTAAGTT 9720 

GCGAGAACGC TTGGGTGAGG TGGAAGCAGA TGCTCCTGTC AAGGCCAAGC ATGTTCGTGA 9780 

AAGTGTCCGT CGCATTTACC GTGATGGATT TCACGTATGT AATGATTTTT ATGGACAACG 9840 

TCGAGAGCAG GACGAGGAAT GTATGTTTTG TGACGAGTTG CTATACAGGG AGTAGGCATG 9900 

CAGATTCAAA AAAGTTTTAA GGGGCAGTCT CCCTATGGCA AGCTGTATCT AGTGGCAACG 9960 

CCGATTGGCA ATCTAGATGA TATGACTTTT CGTGCTATCC AGACCTTGAA AGAAGTGGAC 10020 

TGGATTGCTG CTGAGGATAC GCGCAATACA GGGCTTTTGC TCAAGCATTT TGACATTTCC 10080 

ACCAAGCAGA TCAGTTTTCA TGAGCACAAT GCCAAGGAAA AAATTCCTGA TTTGATTGGT 10140 

TTCTTGAAAG CAGGGCAAAG TATTGCTCAG GTCTCTGATG CCGGTTTGCC TAGCATTTCA 10200 

GACCCTGGTC ATGATTTAGT TAAGGCAGCT ATTGAGGAAG AAATTGCAGT TGTGACAGTT 10260 

CCAGGTGCCT CTGCAGGAAT TTCTGCCTTG ATTGCCAGTG GTTTAGCGCC ACAGCCACAT 10320 

ATCTTTTACG GTTTTTTACC GAGAAAATCA GGTCAGCAGA AGCAATTTTT TGGCTTGAAA 10380 

AAAGATTATC CTGAAACACA GATTTTTTAT GAATCACCTC ATCGTGTAGC AGACACGTTG 10440 

GAAAATATGT TAGAAGTCTA CGGTGACCGC TCCGTTGTCT TGGTCAGGGA ATTGACCAAA 10500 

ATCTATGAAG AATACCAACG AGGTACTATC TCTGAGTTAT TAGAAAGCAT TGCTGAAACG 10560 

CCACTCAAGG GCGAATGTCT TCTCATTGTT GAGGGTGCCA GTCAGGGTGT GGAGGAAAAG 10620 

GACGAGGAAG ACTTGTTCGT AGAAATTCAA ACCCGCATCC AGCAAGGTGT GAAGAAAAAC 10680 

CAAGCTATCA AGGAAGTCGC TAAGATTTAC CAGTGGAATA AAAGTCAGCT CTACGCTGCC 10740 

TACCACGACT GGGAAGAAAA ACAATAAAGG GAGACAGGAT GTAATAATTC TGTCTGTTTC 10800 

TGTTTAACTT AATTAGTGAT GATAATATAA AGATGTATCA CTTGGTATAG AAGCTTTGGT 10860 

ATTAAGTTTT TTATTAAGCC CATACGGAAT ACCGATGGTT GGAGCAGCAG TTATAGCGTT 10920 

CTTAGAAGGT ATAAATAGAA AAATAAGGTC ATTTTAAATC AAAGGATTGA TAAATCAGAA 10980 
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AGAAGGTGAT TTTTTGCGAA CATACGAAAA TAAAGAAGAA CTAAAAGCTG AGATAGAGAA 11040 

AACATTTGAG AAATATATTT TAGAATTTGA TAATATTCCA GAAAATTTAA AAGATAAGAG 11100 

AGCTGATGAA GTTGACAGAA CTCCAGCAGA AAACCTTGCT TATCAGGTTG GTTGGACCAA 11160 

CTTGGTTCTT AAATGGGAAG AAGATGAAAG AAAGGGGCTT CAAGTAAAAA CACCATCGGA 11220 

TAAATTTAAA TGGAATCAAC TTGGTGAATT ATATCAGTGG TTCACAGATA CCTACGCTCA 11280 

TTTATCTCTG CAAGAGTTGA AAGCAAAATT AAATGAAAAT ATTAATTCTA TCTCTGCAAT 11340 

GATTGATTCG TTGAGTGAGG AAGAATTATT TGAACCGCAT ATGAGAAAGT GGGCTGATGA 11400 

AGCGACTAAA ACAGCGACTT GGGAAGTGTA TAAGTTTATT . CATGTAAATA CGGTTGCACC 11460 

TTTTGGAACT TTCAGAACTA AAATCAGAAA ATGGAAGAAG ATAGTATTAT AAATTATATT 11520 

TTTAACTTTA AAAAATTTCA TAAAAATGGT TACCAAAGGC GATAGAAGAA AAACTATCGT 11580 

CTTTTTCTTT GCAAATTTTT AAGAAGGGAG GTGATCTTGC ATGGACTTTG AATATTTTTA 11640 

TAACAGAGAA GCGGAAAGAT TTAACTTCTT AAAAGTACCG GAGATATTAG TTGATAGAGA 11700 

AGAATTTCGG GGCTTATCAG CAGAAGCAAT TATCCTTTAT TCCATACTTC TTAAACAGAC 11760 

AGGAATGTCA TTTAAGAATA ACTGGATAGA CAAGGAAGGC AGAGTATTTA TCTATTTTAC 11820 

TGTCGAAGAA ATTATGAAAA GAAGAAATAT CTCAAAGCCA ACTGCCATAA AAACATTAGA 11880 

TGAGCTTGAT GTAAAAAAGG AATAGGACTG ATCGAAAGAG TAAGGCTTGG ACTTGGTAAG 11940 

CCGAACATCA TTTATGTTAA AGACTTTATG AGTATATTTC AGGTAAAAGA AAATGACTTA 12000 

CAGAAGTCAA AAAACTTAAC TTCAGAAGTA AAAGATTTTA ACCTCAGAAG TAAAGAAAAT 12060 

GAACTTCAAG AGGTTAAGAA CCTTGACTCT AACTATATAG AGAATAATAA GAGTAAGTAT 12120 

AGTAAGAGAG AATATAGTTT TGGTGAAAAC GGACTTGGAA CATTTCAAAA TGTGTTTTTA 12180 

GCTGCTGAAG ATATATCGGA TTTACAAATC ATAATGAACT CACAGCTTGA GAATTACATT 12240 

AGACTTCCTG CAAAACTAGA ATCCTAGTTC ATGATTGATA ATGCCAGCAA TCAAATTCAT 12300 

TCGTAATCCG AAGCGTTTAC GATGATTTCG ATAGATTGTT GAAAACATTT TAAACGTTTT 12360 

TACTTTGGC A AAGATGTTCT CAATCTTGCT TCTCTCCTTG GATAGCGCAT GGTT ACAGGC 12420 

TTTATCTTCA GCTGTTAGCG GCTTGAGTTT GCTGGATTTA OGTGGAGTTT GTACTTGAGG 12480 

ATATATCTTC ATGAGCCCTT GATAACCACT GTCAGACAAG ATTTTACCAG CTTGTCCGAT 12540 

ATTTCTGCGA CTCATTTTGA ACAACTTCAT ATCACGACAA TAGTTCACAG CGATATCCAA 12600 

AGAAACAATT CTCCCTTGAC TTGTGACAAT CGCTTGAGCC TTCATAGCGT GAAATTTCTT 12660 

TTTACCAGAA TGATTCGCTA ATTCTTTTTT TAGGGCGATT GATTTTTACT TCCGTCGCAT 12720 

CAATCATTAC CGTGTCCTCA GAACTGAGAG GAGTTCTTGA AATCGTAACA CCACTTTGAA 12780 
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CAAGAGTTAC TTCAACCCAT TGGCTCCGAC GGATTAAGTT GCTTTCGTGA ATACCAAAAT 12840 

CAGCCGCAAT TTGTTCATAA GTTCGATATT CTCGCACATA TTGAAGAGTG GCCATAAGAA 12900 

GGTCTTCTAG GCTTAATTTA GGTTTTCGTC CACCTTTTGC GTGTTTAAGT TGATAAGCTG 12960 

TTTTTAATAC AGCTAATATC TCTTCAAAAG TCGTGCGCTG AACACCAACA AGACGCTTAA 13020 

ATCGTGCATC AGTTAGTTGT TTACTTGCTT CATCATTCAT AGAACTACTA TACCATATTT 13080 

TGTTTCGCAG GAAGTCTATT GGAAAGTAAG AAATATTGAA GCTGAGGCTA TTAGAAGAAA 13140 

TTGTGAGCGT GGTGCTATTT TTTCAGGTAA AATAAAATAT CACGAAGATT CACAGTTTAA 13200 

AGGAGATCAC TATGTTGAAT GTTATGCTGT TTTAGATAAT ACGGTTATAG CAAGAGATAG 13260 

AATAACAGTC CCTATCGATC CGTTATGTGG AAAAGATTTT ATAGAGTAGC ATATAATTGA 13320 

TTCTTAACTG GAATACTCAC TATCTCTTTA CATCAAGAAA ATGACTAAAC AGGGAAGTTT 13380 

GCCTTCTTCC CTTTTTTTGT TATACTAGTA GAAGAAAAAA TTAGAAAGAT TTGTGGGTGT 13440 

CAAACAGCCC AGTGGGGTGT TTTAATATGG ACTTAGGTCC CACCCAAAGA GGTATTAGTG 13500 

TCGTGTCTGA ATCTTATATC AATGTTATCG GTGCTGGTTT GGCAGGTTCT GAAGCAGCTT 13560 

ACCAAATCGC AGAGCGTGGT ATTCCAGTTA AACTATATGA AATGCGTGGT GTCAAGTCTA 13620 

CACCCCAGCA TAAAACAGAC AATTTTGCTG AGTTGGTTTG TTCCAATTCT TTGCGTGGGG 13680 

ATGCTTTGAC AAATGCAGTT GGTCTTCTCA AGGAAGAAAT GCGTCGCTTG GGTTCTGTTA 13740 

TCTTGGAATC TGCTGAGGCT ACACGTGTTC CTGCAGGTGG TGCCCTTGCA GTGGACCGTG 13800 

ATGGTTTCTC TCAAATGGTG ACCGAAAAAG TTGCCAACCA CCCCTTGATT GAAGTGGTTC 13860 

GTGATGAAAT TACAGAATTG CCGACAGATG TTATTACGGT TATCGCTACT GGTCCTTTGA 13920 

CAAGTGATGC CTTGGCTGAA AAGATTCATG CTCTTAATGA CGGTGCTGGT TTTTATTTCT 13980 

ACGATGCGGC AGCGCCTATT ATCGATGTCA ACACTATCGA TATGAGCAAG GTCTACCTCA 14040 

AATCACGTTA TGATAAGGGA GAAGCGGCCT ACCTCAATGC CCCTATGACC AAGC AAGA AT 14100 

TTATGGATTT CCATGAAGCT TTGGTCAATG CAGAAGAAGC ACCGCTTAGT TCTTTTGAAA 14160 

AAGAAAAGTA CTTTGAAGGA TGTATGCCTA TCGAAGTCAT GGCCAAACGT GGCATTAAAA 14220 

CTATGCTTTA TGGCCCTATG AAGCCAGTCG GTCTTGAGTA CCCAGACGAC TATACAGGAC 14280 

CTCGTGATGG AGAATTTAAA ACACCTTATG CGGTTGTGCA ACTTCGTCAG GATAATGCAG 14340 

CTGGTAGCCT CTACAATATT GTTGGTTTCC AGACCCACCT CAAATGGGGA GAACAAAAGC 14400 

GTGTCTTCCA AATGATTCCG GGTCTTGAAA ATGCGGAGTT TGTCCGTTAT GGTGTGATGC 14460 

ATCGCAATTC TTACATGGAT TCACCAAATC TTCTTGAGCA GACTTACCGT TCTAAGAAAC 14520 
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AACCAAATCT 


CTTCTTTGCT 


GGTCAAATGA 


CGGGTGTGGA 


AGGCTATGTT 


GAGTCGGCGG 


14580 


CTTCAGGCTT 


AGTTGCGGGA 


ATTAACGCAG 


CTCGTCTCTT 


CAAGGAAGAA 


AGCGAGGCTA 


14640 


TTTTCCCCGA 


GACGACAGCG 


ATTGGAAGCT 


TAGCTCATTA 


CATTACCCAT 


GCCGACAGCA 


14700 


AACATTTCCA 


ACCAATGAAT 


GTCAATTTTG 


GGATCATCAA 


GGAGTTGGAA 


GGCGAGCGTA 


14760 


TCCGTGATAA 


GAAGGCTCGT 


TATGAAAAAA 


TTGCAGAGCG 


TGCCCTTGCC 


GACTTAGAGG 


14820 


AATTTTTGAC 


TGTCTAATTT 


TTTTGAAAGA 


ATTGCTCATG 


ATACTATAAA 


AATCTTAGAA 


14880 


ATTGTGATAA 


AATAGGTAGG 


ATGAAAGAAG 


GAGAGTGAAA 


ATGGCGAATC 


CCAAGTATAA 


14940 


ACGTATTTTA 


ATCAAGTTAT 


CAGGTGAAGC 


CCTTGCCGGT 


GAACGTGGCG 


TAGGGATTGA 


15000 


TATCCAAACA 


GTTCAAACAA 


TCGCAAAAGA 


GATTCAAGAA 


GTTCATAGCT 


TAGGTATCGA 


15060 


AATTGCCCTT 


GTTATCGGTG 


GAGGAAATCT 


CTGGCGTGGA 


GAACCTGCAG 


CAGAAGCAGG 


15120 


TATGGACCGT 


GTTCAGGCAG 


ATTACACAGG 


AATGCTTGGG 


ACTGTTATGA 


ATGCTCTTGT 


15180 


GATGGCAGAT 


TCATTGCAAC 


AAGTTGGGGT 


TGATACGCGT 


GTACAAACAG 


CTATTGCCAT 


15240 


GCAACAAGTG 


GCAGAGCCTT 


ATGTCCGTGG 


ACGTGCCCTT 


CGTCACCTTG 


AAAAAGGCCG 


15300 


TATCGTTATC 


TTTGGTGCTG 


GAATTGGTTC 


ACCTTACTTC 


TCGACAGATA 


CAACAGCGGC 


15360 


CCTTCGTGCA 


GCTGAAATCG 


AAGCAGATGC 


CATCCTCATG 


GCTAAAAATG 


GTGTCGATGG 


15420 


TGTTTACAAT 


GCCGATCCTA 


AGAAAGATAA 


GACAGCTGTT 


AAGTTTGAAG 


AATTGACCCA 


15480 


CCGTGACGTT 


ATCAATAAAG 


GTCTTCGTAT 


CATGGACTCA 


ACAGCTTCAA 


CCCTCTCAAT 


15540 


GGACAACGAC 


ATTGACTTGG 


TTGTATTCAA 


CATGAACCAA 


CCAGGCAACA 


TCAAACGTGT 


15600 


CGTATTTGGT 


GAAAATATCG 


GAACAACAGT 


TTCAAATAAT 


ATCGAAGAAA 


AGGAATAAGA 


15660 


AAGAATATGG 


CTAACGCAAT 


TATTGAAAAA 


GCTAAAGAGA 


GAATGACCCA 


GTCTCACCAA 


15720 


TCACTTGCTC 


GTGAATTTGG 


TGGTATCCGT 


GCTGGTCGTG 


CCAATGCAAG 


CTTGCTTGAC 


15780 


CGTGTACATG 


TAGAATACTA 


TGGAGTCGAA 


ACTCCTCTTA 


ACCAAATCGC 


TTCAATTACG 


15640 


ATTCCAGAAG 


CGCGTGTTTT 


GTTGGTAACA 


CCATTTGACA 


AGTCTTCATT 


GAAAGACATC 


15900 


GAACGTGCCT 


TGAACGCTTC 


TGATATTGGT 


ATCACACCGG 


CTAATGACGG 


TTCTGTGATT 


15960 


CGCTTGGTTA 


TCCCAGCTCT 


TACAGAAGAA 


ACTCGTCGTG 


ACCTTGCTAA 


AGAAGTGAAG 


16020 


AAGGTCGGCG 


AAAATGCTAA 


AGTGGCTGTC 


CGCAATATCC 


GTCGCGATGC 


TATGGACGAA 


16080 


GCTAAGAAAC 


GAGAAAAAGC 


AAAAGAAATC 


ACTGAAGACG 


AATTGAAGAC 


TCTTGAAAAA 


16140 


GACATTCAAA 


AAGTAACAGA 


CGATGCTGTT 


AAACACATCG 


ACGACATGAC 


TGCTAACAAA 


16200 


GAGAAAGAAC 


TTTTGGAAGT 


CTAAAAATAA 


ACAGAAAAAC 


TCAGTTGGCA 


TTGCTGGCTG 


16260 


AGTTTTATTC 


GAAAGAAGGA 


AATATGAATA 


CAAATCTTGC 


AAGTTTTATC 


GTTGGACTGA 


16320 



WO 98/18931 



PCT/US97/19588 



283 

TCATCGATGA AAACGACCGT TTTTACTTTG TGCAAAAGGA TGGTCAAACC TATGCTCTTG 16380 

CTAAGGAAGA AGGCCAACAT ACAGTAGGGG ATACGGTCAA AGGTTTTGCA TACACGGATA 16440 

TGAAGCAAAA ACTCCGCCTG ACAACCTTAG AAGTGACTGC CACTCAGGAC CAATTTGGTT 16500 

GGGGACGTGT CACAGAGGTT CGTAAGGACT TGGGTGTCTT TGTGGATACA GGCCTTCCTG 16560 

ACAAGGAAAT CGTTGTGTCA CTCGATATTC TCCCTGAGCT CAAGGAACTC TGGCCTAAGA 16620 

AGGGCGACCA ACTCTACATC CGTCTTGAAG TGGATAAGAA AGACCGTATC TGGGGCCTCT 16680 

TGGCTTATCA AGAAGACTTC CAACGTCTTG CTCGTCCTGC CTACAACAAC ATGCAGAACC 16740 

AAAACTGGCC AGCCATTGTT TACCGTCTCA AGCTGTCAGG AACTTTTGTT TACCTACCAG 16800 

AAAATAATAT GCTTGGTTTT ATTCATCCTA GCGAGCGTTA CGCAGAGCCA CGTTTGGGGC 16860 

AAGTATTAGA TGCGCGCGTT ATTGGTTTCC GTGAAGTGGA CCGCACTCTG AACCTCTCCC 16920 

TCAAACCACG CTCCTTTGAA ATGTTGGAAA ACGATGCTCA GATGATTTTG ACTTATTTGG 16980 

AAAGCAATGG CGGTTTCATG ACCTTAAATG ACAAGTCATC TCCAGACGAC ATCAAGGCAA 17040 

CCTTTGGCAT TTCTAAAGGT CAGTTCAAGA AAGCTTTAGG TGGTCTTATG AAGGCTGGTA 17100 

AAATCAAGCA GGACCAGTTT GGGACAGAGT TGATTTAGGG AGGCTTATGA GAAAATCATT 17160 

TTACACTTGG CTCATGACCG AGCGCAATCC TAAAAGTAAC AGTCCCAAAG CAATTTTGGC 17220 

AGACCTCGCT TTTGAAGAGT CAGCCTTTCC AAAACACACA GATGATTTTG ATGAGGTCAG 17280 

TCGCTTTTTG GAGGAGCATG CCAGTTTCTC TTTTAACCTA GGAGATTTTG ACAGCATTTG 17340 

GCAGGAATAT CTAGAACACT AGCATTTATT CATTGGGTTT GGGCTAGTAA TTTCTCCATC 17400 

CCTCTGCTAT AATAAAAAGA AATAAAAGGA TTAGAGAGGT TCTTTATTTG AAGGAACATT 174 60 

CAATAGACAT TCAACTGAGT CATCCAGATG ACCTGTTTCA TCTTTTTGGT TCCAATGAAC 17520 

GCCATCTTCG TTTGATGGAA GAAGAGCTTG ATGTTGTGAT TCATGCTCGT ACGGAGATTG 17580 

TCCAGGTTTT GGGAGAAGAG TCTGCCTGTG AGGAAGCCCG TCAAGTTATT CAGGCTTTGA 17640 

TGGTCTTGGT AAATCGTGGG ATGACCGTTG GTACGCCAGA TGTAGTCACT GCGATTAGCA 17700 

TGGTCAAAAA TGATGAAATT GACAAGTTTG TCGCCCTTTA CGAAGAAGAA ATTATCAAGG 17760 

ATAATACTGG GAAACCTATC CGTGTCAAAA CCCTAGGGCA AAAGCTTTAT GTGGACAGTG 17820 

TCAAACAGCA TGATGTGACC TTTGGAATTG GGCCAGCAGG TACAGGGAAG ACCTTCCTTG 17880 

CAGTGACCTT GGCAGTGACT GCCCTTAAAC GTGGGCAAGT CAAGCGAATT ATCCTAACTC 17940 

GTCCAGCGGT GGAAGCGGGA GAGAGTCTTG GATTTCTTCC GGGTGATCTT AAGGAGAAGG 18000 

TGGATCCTTA CCTTCGTCCT GTTTACGATG CCTTGTATCA AATTCTTGGG AAAGACCAAA 18060 
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CGACTCGTCT CATGGAGCGT GAAATTATCG AAATTGCGCC CCTTGCCTAT ATGCGTGGCC 18120 

GGACCTTGGA TGATGCCTTT GTCATTCTCG ATGAGGCGCA AAACACGACC ATCATGCAGA 18180 

TGAAGATGTT CTTGACGCGT TTAGGTTTTC ATTCTAAGAT GATTGTCAAT GGAGATATTA 18240 

GTCAGATTGA CCTGCCACGT AATGTCAAGT CCGGTTTGAT TGATGCTCAA GAGAAACTCA 18300 

AGAACATCCA TCAGATTGAC TTTGTTCATT TTTCAGCCAA GGATGTGGTT CGOCATCCTG 18360 

TTGTCGCTCA GATTATCCGA GCCTATGAAT ATTCTACTGA AGTTGCACAC GACTGATTTT 18420 

GAGGAAGTTC GCCTGCAAAA GAATAGACTT GTTCGGTAAC TGTAAAAAGT GTTATACTAT 18480 

TTTTATGGAA ACAGTATACG ACAAAGCACA AAAACTTAAC TCAAAAAACT TCAAACTATT 18540 

GATTGGTGTC AAAAAGGAAA CCTTTCAACT CATGCTAGAA CACCTGAATT CAGCCTATCA 18600 

GATTCAGCAC CGAAAAGGTG GACGTCCACG TAGTCTGCCC ATGGAAGACC AGCTCATTAT 18660 

GACCCTCCGT TACTTGCGAT ATTATCCCAC TCAGCGTCTG CTGGCCTTTG ATTTTGGCGT 18720 

CGGTGTAGCT ACGGTAAATG CCATCATCAC TTGGGTGGAG GATACACTTC GTGCGTCAGG i8780 

TAGCTTTGAT TTGGACCATT TAGAAGCCCC GAGTGCTGCT GTGGCTATTG ACGTGACCGA 18840 

AAGTCCGATT CAGCGTCCAA ACAAAACCAA AGCAAAAATT ATTCTGGTAA AAAGAAACGA 18900 

CACACCTTAA AAACTCAAAT TATGCTGGAT TTGACGACAC ATAAAGTCTG TCAAATGGCC 18960 

TTTTCTGACG GACATACGCA TGATTTTACT CTCTTCAAAG AAAGTATTGG ACAAAGTTTG 19020 

CCTGAAACGA CGCTTGCCTT TGTTGACCTA GGTTATTTAG GCATCTTGAA ATTTCATGAG 19080 

AATACTTTCA TTCCTGCTAA AAATTCCAAA AATCGCCGCC TGAGTGAGGA TGATAAGCAG 19140 

TTAAATAAAG AGATGTCAGC GATACGAATT GAAATTGAAC ATTTTAACGC TAAATTCAAG 19200 

ACCTTCCAAA TCATGTCAGT CCCTTATCGT AACCGCAGAA AACGTTTCGA GTTACGGGCG 19260 

GAATTAATTT GTGCCATCAT CAATTATGAA GTGAACTAGA TTCCGAACAA GTCTAATATA 19320 

CTTTTGAGAG AGGAAAATCC AGTTGTATAG GCTAAAGGTT TTATCCAAAG GTCTGAGACA 19380 

ACGATTAGGC ACGATGGAAA GAACTTTTAT GTGGCTGATG ACGATCAGTG CATCTTCCTG 19440 

TGTCATAATC ACAGGGCACA AGAAAGTAGG AATTTGAAAA GATGATTGAC CAACTATCTA 19500 

AGTATTACAG TTGTAGGATA CTAACTGAAA AGGATATTCC AAGTATTTTA TCTTTATATG 19560 

AAAGTAATCC TCTGTATTTT CAGCATTGTC CACCAGAGCC AAATTTTGCA ACTGTAAAAG 19620 

AGGACATGCT TTGTCTACCT GAAGGTAAAG CTAAGGCTGA TAAGTTTTTT GTTGGATTTT 19680 

GGAATGGATC TGACCTTGTG GCTGTTATGG ATTTTGTCTA TGCATATCCT GATGAGGAGA 19740 

CTGTTTTTAT TGGTTTGTTT ATGGTTGATC AAGCCTATCA GAGAAAAGGG ATTGGTAGTC 19800 

ATATTGTGAC AGAAGCACTA GCTTATTTTG CTAAGAACTT TCGAAAGGCA CGTTTGGCTT 19860 
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ATGTTAAGGG AAATCCGCAA TCTCAGCATT TTTGGGAAAA GCAGGGCTTT AAATCAATTG 19920 

GATGCGAGGT TAAGCAAGAA CTCTATACGG TTGTTATCGC TGAACAGAGC CTAGAAGATT 19980 

AGAAATGGCA TCAAGTAAGA ACTATTTGGA ATTTGTTTTG GAACAATTAT CAGGATTAGA 20040 

TGATGTGACT TACCGTTCCA TGATGGGGGA GTATATTCTT TACTTCCGCG GCAAGATTAT 20100 

TGGCGGCATT TATGACGATC GCTTTTTAGT TAAACCCGTG CAAGCAGTCT TAGATAAGAT 20160 

TGACCAATCT TCTTTTGAGT TTCCATACAA AGGTGCCAAA GAAATGATTT GAGTGGAAGA 20220 

ACTTGATAAT AAGATGTTTC TATAAGACCT AATTTTAGCT ATGTATAACC AACTGCCAAC 20280 

GCCCAAACCT AAAAAGAAAA AGCAAGGGTG AACGAAGTAA AAAAGAAGTC TGCTAAGGCC 20340 

CTGTCTTTGC ACGGGTAAAA TTTTATATAT AAAAAGAAGC TGGGACTAAA GAGCTCAGCT 20400 

TCCTTTGGTT TATATAATTG TCATTACAAG ACGAAGTGGT TGGGCGAAAC TCTGTTGACT 20460 

TTATTCAATT TAGAGTTTCT TATGCACAAT TGAGTCTGGA ACGAAAGTCT CCAGTTGCAA 20520 

AGTATACAGT ACAATAAACC AACGATGTAA TAGCTGATGA CACAAAGCAC AGTGGGTAGG 20580 

ACTTGCGAAG TCACCCTTTT CTTTTCAAAA TTTATACTAA ATCATTGATA TCAGTGTAGT 20640 

CACGATTAAG TCCTTGAGCA ACTGGTAGGT TAGTCAAGTA ACCTTGATAA GTAGTCACAC 20700 

CTTGACGCAA GCCTTCATCT TCAGAGATTG CTTGTGCGAA TCCTTTGCCA GCCAAAGCTT 20760 

CGATATAAGG AAGAGTGACA TTGGTTAGGG CGATGGTTGA AGTGCGAGCA ACCGCACCAG 20820 

GGATATTGGC AACGGCATAG TGGAGAACAC CGTGTTTTTC ATAGACGGGT TCATCGTGCG 20880 

TTGTCACACG GTCAGCTGTT TCGATAACGC CACCTTGGTC AACAGCAACG TCAACGATAC 20940 

AGAGCCTGGA CGCATTTGTT TGACCATCTC ATCTGTCACC AATTCCGGTG CTTTTGCACC 21000 

AGGGATGAGA ATGGCTCCAA TCACCACATC AGCATCTCTC ACACTTGCTT CAATGTTGAA 21060 

TGAATTAGAC ATAAGAGTTT GAATTTGACT TCCAAAGACT TCTTCTAGAA CTGAGAGACG 21120 

CTTGGAACTA ATATCTAAAA TAGTCACTTG AGCACCAAGA CCAAGGGCGA TGCGGGCAGC 21180 

ATGTGTACCG ACGACACCAC CACCGATGAT AGTTACTTTT CCTTTTGGAA CACCTGGTAC 21240 

ACCACCAAGT AGAACACCAG AGCCACCAGC TTGCTTAGTA AGGAAGTGAG CTCCGATTTG 21300 

AACAGCCATA CGACCTGCAA CCTCACTCAT AGGAACGAGG AGCGGTAGTT GTCCTTGATT 21360 

GTCACGAACA GTTTCAGTTG TTTTTGCTGT TAACATAGCA TCTGCTAATT CTGGAGCAGC 21420 

GGCCATGTGC AAGTAGGTGA AGAGAAGAAG ATCGTCGCGC AAGTAACCGT ATTCAGAACT 21480 

TAAAGATTCT TTTACTTTCA CAACCAACTC TGCTGCCCAA GCTTCACCAG CAGTAGCGAC 21540 

AATCTCAGCT CCTTGCTTTT GATAGTCAGC ATCAGTAAAG CCAGAACCGA GACCAGCATT 21600 
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TGTTTCGATA AGGACACGAT GACCACGACT AACTAAGCTA TGAACACCTG CAGGTGTGAG 21660 

GGCGACACGG TTTTCGTTAT TTTTAATTTC TTTTGGGATT CCGATTAACA TTGAGATAAC 21720 

CTACCTTTCA ATTGACGGTC TTGTTTTGGT TGTCACATTC CAGTTCATAA ATCAAAAATG 21780 

TGACGGTTTC ATTGTATATG AAACCGCTTC AAAAATCAAG AAAAACTTGT CATCCAAATT 21840 

TTTTTATGCT AGACTAGTGA AAATCAAGCT CTAATGGAGG GAAAAGTATG GAATCAATAT 21900 

TTGTGAAATT TGCCCAGTAT CCGTCTATAG AAACGGAGCG TTTATTGCTC AGACCTGTAA 21960 

CTTTGGATGA TGCGGAAcAA TGTTTGACTA TGCCTCGGAC AAGGGTAATA CACGTTACAC 22020 

TTTTCCAACC AATCAAAGCT TGGAAGAAAC CAAGAATAAC ATTGCTCAGT TCTACTTGGC 22080 

TAATCCCTTG GGACGTTGGG GAATAGAACT AAAAAGCAAT GGTCAGTTTA TTGGAACCAT 22140 

TGACTTGCAC AAGATTGATT CTGTTCTTAA GAAGGCAGCT ATTGGCTACA TTATCAATAA 22200 

AAAGTATTGG AATCAAGGAT TAACGACAGA AGCCAATCGT GCTGTGATTG AGCTAGCTTT 22260 

TGAGAAGATA GGGATGAATA AGTTGACTGC CCTTCACGAT AAGGCTAATC CCGCGTCAGG 22320 

AAAGGTCATG GAGAAATCAG GCATGCGTTT TTCCCATGCA GAACCATATG CTTGTATGGA 223 80 

CCAGCATGAA AAAGGCCGAA TCGTGACAAG AGTTCATTAT GTCTTGACCA AGGAAGACTA 22440 

TTTTGCAAAT AAATAAGCAG TTGAAAAGAA ATTTTTCGAC TGTTTTTTCT TCCTCTTACG 22500 

AATAATCTAA GAGAGGAGAA AATATGGAAG CAATTATCGA GAAAATCAAA GAGTATAAAA 22560 

TCATCGTCAT CTGTACTGGT CTGGGCTTGC TTGTAGGAGG ATTTTTCCTG CTAAAACCAG 22620 

CTCCACAAAC ACCTGTCAAA GAGACGAATT TGCAGGCTGA AGTTGCAGCT GTTTCCAAGG 22680 

ACTCATCGAC CGAAAAGGAA GTGAAGAAGG AAGAAAAGGA AGAACCCCTT GAACAAGATC 22740 

TAATCACAGT AGATGTCAAA GGTGCTGTCA AATCGCCAGG GATTTATGAC TTGCCTGTAG 22800 

GTAGTCGAGT CAATGATGCT GTTCAGAAGG CTGGTGGCTT GACAGAGCAA GCAGACAGCA 22860 

AGTCGCTCAA TCTAGCTCAG AAAGTTAGTG ATGAGGCTCT GGTTTACGTT CCTACTAAGG 22920 

GAGAAGAAGC AGTTAGTCAA CAGACTGGTT CGGGGACAGC TTCTTCAACA AGCAAGGAAA 22980 

AGAAGGTCAA TCTCAACAAG GCCAGTCTGG AAGAACTCAA GCAGGTCAAG GGACTGGGAG 23040. 

GAAAACGAGC TCAGGACATT ATTGACCATC GTGAGGCAAA TGGCAAGTTC AAGTCAGTAG 23100 

ACGAGCTCAA GAAGGTCTCT GGCATTGGTG GCAAAACAAT AGAAAAGCTT AAAGACTATG 23160 

TTACAGTGGA TTAAGAATTT CTCTATTCCC CTAATTTACC TGAGTTTTCT ATTACTTTGG 23220 

CTTTATTACG CTATTTTCTC AGCATCTTAT CTTGCTTTGT TGGGCTTTGT TTTTCTGCTA 23280 

GTCTGTCTCT TTATCCAATT TCCGTGGAAA TCTGCTGGTA AAGTTCTAAT AATTTGCGGA 23340 

ATCTTTGGAT TTTGGTTTGT TTTTCAAAAT TGGCAACAGA GTCAAGCGAG TCAAAATCTG 23400 
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GCGGATTCTG TTGAAAGGGT ACGGATTTTG CCTGATACTA TTAAGGTTAA TGGTGATAGT 23460 

CTATCCTTTC GTGGCAAGTC TAACGGTCGT GCTTTCCAAG TCTATTATAA ACTCCAGTCC 23520 

GAGGAGGAGA AAGAAGCCTT TCAAGCTTTA ACTGACCTGC ATGAGATAGG ACTAGAAGGG 23580 

AAGCTTTCGG AGCCAGAAGG GCAGAGAAAT TTTGGTGGCT TTAATTACCA AGCCTATCTG 23640 

AAGACTCAGG GAATTTACCA GACTCTCAAT ATCAAAACAA TCCAGTCACT TCAAAAGATT 23700 

GGCAGTTGGG ATATAGGAGA AAACTTGTCC AGTTTACGTC GAAAGGCTGT GGTTTGGATT 23760 

AAGACGCACT TTCCAGACCC TATGGGCAAT TACATGACAG GACTCTTGCT GGGACATCTG 23820 

GACACCGACT TTGAGGAGAT GAATGAGCTT TATTCCAGTC TAGGAATTAT CCACCTCTTT 23880 

GCCCTATCTG GCATGCAGGT AGGTTTTTTC ATGAATGGAT TTAAGAAACT TCTCTTGCGA 23940 

TTGGGCTTGA CCCAAGAAAA GTTGAAATGG CTGACTTATC CCTTTTCCCT TATCTATGCG 24000 

GGACTAACTG GATTTTCAGC ATCGGTTATT CGCAGTCTCT TGCAAAAGCT ACTGGCTCAA 24060 

CATGGGGTTA AGGGCTTGGA TAATTTTGCC TTGACGGTGC TTGTCCTCTT TATTGTCATG 24120 

CCAAACTTTT TCTTGACAGC AGGAGGAGTC TTGTCCTGCG CTTATGCTTT TATCCTGACC 24180 

ATGACCAGCA AAGAAGGGGA GGGGCTCAAG GCTGTTACTA GTGAAAGTCT AGTCATCTCC 24240 

TTGGGCATAT TGCCCATTCT ATCCTTCTAT TTTGCGGAAT TTCAACCTTG GTCTATCCTT 24300 

TTGACCTTTG TCTTTTCCTT TCTTTTTGAC TTGGTCTTCT TACCGCTCTT GTCTATCTTA 24360 

TTTGTCCTTT CCTTTCTCTA TCCAGTCATT CAGCTGAACT TTATCTTTGA ATGGTTAGAG 24420 

GGCATTATTC GCTTGGTCTC GCAGGTGGCA AGGAGACCAC TTGTCTTTGG TCAACCCAAC 24480 

GCATGGCTTT TAATCTTATT GTTAATTTCC TTGGCTTTGG TCTATGATTT GAGGAAAAAC 24540 

ATTAAAGGAT TAACAGTATT GAGTTTATTG ATTACAGGTC TCTTTTTCCT TACCAAGTAT 24600 

CCACTGGAAA ATGAAATCAC CATGCTGGAT GTGGGGCAAG GAGAAAGTAT TTTCTACGGG 24660 

ATGTAACTGG GAAAACCATT CTCATAGATG TAGGTGGTAA GGCAGAATCT TATAAGAAAA 24720 

TCAAAAAATG GCAAGAAAAG ATGACGACCA GCAATGCCCA GCGAACCTTG ATTCCCTATC 24780 

TCAAAAGTCG AGGAGTAGCT AAGATTGACC AGCTAATTTT GACTAACACG GACAAGGAGC 24840 

ATGTTGGAGA TTTGTCAGAG ATGACCAAGG CTTTCCATGT AGGGGAGATT CTAGTATCAA 24900 

AAGACAGTCT GAAACAGAAG GAATTTGTGG CAGAACTACA GGCGACTCAA ACAAAGGTGC 24960 

GTAGTATGAT AGTAGGGGAG AACTTGCCCA TTTTTGGAAG TCAGTTAGAA GTTCTATCTC 25020 

CAAGGAAAAT GGGAGATGGA GGACACGATG ATACCCTAGT TCTGTATGGG AAATTCTTGG 25080 

ATAAGCAATT TCTCTTCACG GGAAATTTGG AGGAGAAAGG AGAGAAGGAC TTGCTGAAGC 25140 
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ACTATCCAGA CTTGAAAGTA AATGTTTTGA AAGCTAGCCA ACATGGCAAT AAAAAATCAT 25200 

CAAGTCCAGC CTTTCTAGAA AAACTCAAAC CAGAGCTTAC TCTTATCTCA GTTGGAAAGA 25260 

GCAATCGAAT GAAACTCCCC CATCAGGAAA CATTGACACG ACTGGAAGGT ATCAATAGCA 25320 

AAGTTTATCG AACTGACCAG CAAGGAGCTA TACGTTTTAA GGGGTTGGAT AGTTGGAAAA 25380 

TCGAAAGTGT TCGATAGGAA GGATAAATGT TGTAGATTAG TGAAATAAAC TAAAAATTTG 25440 

TTGCATAATA ATGATAAAAA TGGTATAATG AAAACGTATT CAATATTGAG GATATAAAAT 25500 

CATTAAAAAT CAGCAAAAGT TGTTTTATTA GTTAGTTTAT AATCTATTGG TCTTCTTCAG 25560 

TCCAGTGTAT CTGCTGTGAC AGTCACTAAA AGTTACAAGT ATGATTGGAA TACGGTTTGG 25620 

GAATATAGTA CCAACTATCA CGACCATCAG TATGCTTGGA TTCCGTCATG GTCTCGTTAT 25680 

GACAGCTATT CTGAGTATAA AGTTGGCGGA GGCTGGAACT ACGCTCGTTA TGAGGTCATA 25740 

AACTATTACA GCGGAGGCTA TTAATTCTTA AAGAGTGAGA AAAAGGAGGG CTAGATATGT 25800 

TGCAGCTTAC TCATGTGACC TTAAAAACGC GACAAGTCAT CTTGCAAGAT GTGGATTTCA 25860 

CCTTTAAAAA GGGTAGGGTT TATGGTCTTC TTGCTATCAA TGGCTCTGGA AAGACGACCC 25920 

TGTTCCGTGC CATTAGCAAT TTAATTCCCA TAAGTAGTGG AAATATCGCA GCCCCTCCTT 25980 

CTTTATTTTA TTATGAGAGT ATTGAATGGC TGGATGGAAA CTTAAGTGGG ATGGACTACC 26040 

TTCGTCTTAT CAAAAACATC TGGAAGTCAG GTCTGAACTT GAGGGATGAA ATCGCCTATT 26100 

GGGAAATGTC TGACTATATC AGTCTTCCCA TTCGCAAGTA TTCCTTAGGC ATGAAGCAAC 26160 

GCTTGGTGAT TGCCATGTAT TTCCTCAGTC AGGCCAAATG CTGGCTCATG GATGAGATTA 26220 

CAAATGGCTT AGATGAGTAT TATCGACAGA AGTTTTTTGA TAGGCTAGCA CAAATCGATA 26280 

GACAAGAACA GCTGGTTCTT TTAAGTTCCC ACTATAAGGA AGAGTTGGTT GATGTCTGCG 26340 

ATAGAGTAGT AACCATTCAT CAGGGGCAGA TAGAAGAGGT TTAGTTTATG AAAGATGTTA 26400 

GTCTATTTTT ATTGAAAAAA GTTTTCAAAA GCCGCTTAAA CTGGATTGTC TTAGCTTTAT 26460 

TTGTATCTGT ACTCGGTGTT ACCTTTTATT TAAATAGTCA GACTGCAAAC TCACACAGCT 26520 

TGGAGAGCAG GTTGGAAAGT CGCATTGCAG CCAACGAGAG GGCTATCAAT GAAAATGAAG 26580, 

AGAAACTCTC CCAAATGTCT GATACCAGCT CGGAGGAATA CCAGTTTGCT AAAAATAATT 26640 

TAGACGTGCA AAAAAATCTT TTGACGCGAA AGACAGAAAT TCTGACTTTA TTAAAAGAAG 26700 

GGCGCTGGAA AGAAGCCTAC TATTTGCAGT GGCAAGATGA AGAGAAGAAT TATGAATTTG 26760 

TATCAAATGA CCCGACTGCT AGCCCTGGCT TAAAAATGGG GGTTGACCGC GAACGGAAGA 26820 

TTTACCAAGC CCTGTATCCC TTGAACATAA AAGCACATAC TTTGGAGTTT CCGACCCACG 26880 

GGATTGATCA GATTGTCTGG ATTTTAGAGG TTATCATCCC AAGTTTGTTT GTGGTTGCTA 26940 
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TTATTTTTAT GCTAACACAA CTATTTGCAG AAAGATATCA AAATCATCTG GACACAGCTC 27000 

ACTTATATCC TGTTTCAAAA GTGACATTTG CAATATCCTC TCTTGGAGTT GGAGTGGGAT 27060 

ATGTAACTGT GCTGTTTATC GGAATCTGTG GCTTTTCTTT TCTAGTGGGA AGTCTGATAA 27120 

GTGGTTTTGG ACAGTTAGAT TATCCCTACC CAATTTATAG CTTAGTGAAT CAAGAAGTAA 27180 

CTATTGGGAA AATACAAGAT GTATTATTTC CTGGCTTGCT CTTAGCTTTC TTAGCCTTTA 27240 

TCGTCATTGT GGAAGTTGTG TACTTGATTG CTTACTTTTT CAAGCAAAAA ATGCCTGTCC 27300 

TCTTTCTTTC ACTCATTGGG ATTGTTGGCT TATTGTTTGG TATCCAAACC ATTCAGCCTC 27360 

TTCAAAGGAT TGCACATCTG ATTCCCTTTA CTTACTTGCG TTCAGTGGAG ATTTTATCTG 27420 

GAAGATTACC TAAGCAGATT GATAATGTCG ATCTAAATTG GAGCATGGGA ATGGTCTTAC 27480 

TTCCTTGCCT GATTATCTTT TTGCTATTGG GAATTCTATT TATTGAAAGA TGGGGAAGTT 27540 

CACAGAAAAA AGAATTTTTT AATAGATTCT AGCTTTCCTA TAGGTAGGGA AAATAAGTAA 27600 

AAACTAACAT AGAGAGGGAA TCAACTTGAT TCTCTCTTTT TGATTCGAAA ACCAAACCAA 27660 

AATACAAACA CAAACTTTTC AAAAAATAAC TTTTTATCTT GACAAGAGCT AGAAAACTTG 27720 

GTATCATATA AAAGTTGAGA AAAGCAGAAG TGAGAGCTTC TCGCCTTGTG ACATTAAGTT 27780 

GCCTGGCCCT ACGGATGAAA AGTTTCGAAG AAACGCTATC ATAACGTGCG GGCTTGTATA 27840 

TTTACAAGTC CGCTATTGTT TTTCTCTAAT AAAACAAAAG AGGTGAAAAC CATAGCAAAG 27900 

CAAGACTTAT TCATCAATGA TGAGATTCGT GTACGTGAAG TTCGCTTGAT TGGTCTTGAA 27960 

GGAGAACAGC TAGGTATCAA GCCACTCAGT GAAGCGCAAG CTTTGGCTGA TAACGCTAAT 28020 

GTTGACCTAG TATTGATTCA ACCCCAAGCC AAACCGCCTG TTGCAAAAAT TATGGACTAC 28080 

GGTAAGTTCA AATTTGAGTA CCAGAAGAAG CAAAAAGAAC AACGTAAAAA ACAAAGCGTT 28140 

GTTACTGTGA AAGAAGTTCG TCTAAGTCCG G 28171 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7147 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

CCGCTCAACT TTTGCAATCA AGGCTAAGTA GACAGCAGCA AATTTCATAT TGTATAATTT 60 

CTGACTCATA CTTCTCTCTT TCTATGTGTA CTAGTATAAA TAAGAAAAAG AAGGCCGTCA 120 
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AGCCTTCTTT TGATTTATTC TTCTGCTTCA TCTTCTGTAA ATTGACTATT GTACAAGTCA 180 

GCGTAGAAGC CACCTTGCGC CATCAGTTCC TCATAGTTGC CTTGCTCGAT GATATTTCCA 240 

TCTTTCATGA CCAAGATCAA GTCTGCATTT CGGATGGTTG ACAAGCGGTG GGCAATGACA 300 

AAGGATGTGC GTCCTTCCAT CAAACGGTCC ATGGCTTTTT GGATCAATTC CTCTGTCCGT 360 

GTGTCAACAG AAGAAGTCGC CTCATCCAAA ATCAAAAGCG GTGCATCCTT AAGAAGGGCA 420 

CGAGCAATAG TCAATAGTTG TTTTTGTCTT ACAGACAAGG TCACGGTGTC ATCCAAGATG 480 

GTATCATAGC CATCTGGCAA GGTCATAATA AAGTGGTGAA TTCCCACAGC CTTACTAGCT 540 

TCCATCATTC GTTCATCACT AATCCCTATT TGATTATAGA TGAGATTGTC TCGAATAGTT 600 

CCTTCAAAGA GCCAGGTATC CTGCAAGACC ATTGAAAAGG CATCATGCAC TTCTGAACGC 660 

GTCATAGCCT TGGTATCCAC ACCATCAATG CGAATACTTC CCTTATCAAT CTCATAGAAT 720 

TTCATCAAAA GATTGACAAT GGTTGTCTTA CCAGCCCCAG TCGGCCCAAC AATGGCAACC 780 

TTTTGACCAG CATGAGCTGT CGCAGAGAAG TCATAGTCTT GAACATTGAC ACCGTCCACC 840 

AGAATTTCTC CTGCTGACAC GTCGTAGAAA CGTGGAATCA GATTGACCAG AGTTGATTTA 900 

CCAGAACCTG TTGACCCAAT AAAGGCCACT GTTTGACCAG TTTCTGCTTT AAAGCTAACA 960 

TGTTCAATAA CTGCCTCCGA ATTTGCCGCA TAGCGgAAGG TCACATCCTT AAACTCGACC 1020 

TGACCTTTGA AGTTTTCATC AGTCAGCTGC ACTTGAACAG GGTTTTGGAT AGAAGAATGC 1080 

AAATCTAAAA CTTGATTAAT CCGCTTAGCA GAGACCATAG TTCGGGGAAG AACGATGAAG 1140 

AGTGCTCCCA TGAGAAGGAA GCCCATGACA ACCTACATGG CATAAGACAT GAAAACAATC 1200 

ATGTCACTAA AGAGAGGCAG ACGCGCTATC GGAGCAGCGT CGTTAATCAC ATAGGCCCCA 1260 

ATCCAGTAAA TCGCCACACT CAAACCACTT GAAATCCCCA TCATGATAGG ATTCAAAATA 1320 

GCCATAAGAC GGTTGACAAA CAAATTCAAA CGGGTCAATT CATCATTTAC TGCTGCAAAT 1380 

TTTTCATTTT GATAATCCTC TGCATTGTAG GCACGAACGA CACGAATACC TGTTAAACTC 1440 

TCACGAGTGA TACTGTTCAG TTTATCTGTC AGCCCCTGAA TCAAGGACTG TTTTGGAAAG 1500 

GCTAGCGTCA TCAAAACGGT CGTCATCAGG ACGTTGATAA TCACTGCCAC AAGTACGGCC 1560 

CAGAGCCAGT ATTCTGAATG ACCTAAAATC TTCCCAATAG CCCAGATAGC CATAATTGAA 1620 

CCACGCGTTA CCACTTGCAA GCCCATAGTA ATCAACATTT GAACTTGAGT AATGTCATTG 1680 

GTAGTACGCG TCAAGAGGCT AGGAATTGAA AATTTCTTAA TCTCTGTCTG CGAGTAATCC 1740 

AAAACTCGGT TAAAAATATC ACTTCTCAGC CTACTAGTAT AAGAAGCCGC CACTCGGGAT 1800 

GCAAAAAATC CAACTGCAAC TACGGACAAG AAGGCAAGAA AGGACATTCC CATCATCATG 1860 

CTTGCCGACT GCCACAACTC ATCTAAATTA GTTTCTTGAC TACCTAGCAA ATCCGTAATT 1920 
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TTCGAGATAT AGGTCGGCAC TTCCAACTCT AGATAGACCG AAAAGCAAGT AAAGAGAATG 1980 

GCTAGTAAAA TCATCCCCCA TTCTTTTCTA CTAATTCTTT TGGCTAATTT CTTTATTCTC 2040 

TCCTCCTATT CCCTTGATAT TTTGCCTGTA GTTGACCGAG AACCTTCTCA AAAATCAGTA 2100 

ATTCATCTTC ATCAATGTCT TCCATCAACT GCTTGTCTAT GCGTTCAAAA AAAGCCTTAA 2160 

CCTGTTGCAT CTGAGAACGT GCTTTGTCCG TCAGACGAAC AAACTTAGCC CGCTTATCAA 2220 

CAGGACTCGC CTCCAATTCC ACCAAACCAT TTTGCACTAT ACGCTTAACC AGATTACTAG 2280 

CAACAGGCTT GGTAATATTG AGTTCCTGCT CGATATCTTT AATCAAGACC AAGTCTTGGT 2340 

TTTTCTCGCG ATTATCCAAA AAACGCACAA CCTGACCTTG CGGCCCACCC ATAAATTCAA 2400 

TGCCGCAACG TTTGGCTTCC TTTTGCACCA TCAGGTGAAT TTGATGACCA AAACGCTTAA 2460 

AGACTAACAT CGGTTTATCC ATAATCTCCC CCTTCTAAAT AAAAATAGTT CTCTGGAGAA 2520 

TAATTAAATT TCTATGAGAA CTATTTTCTT GATTAAAAAA ATCCCAAGTG ATTTTCTCAC 2580 

TTAGGATCAT GTTCTATAGG TTAAATTAAA ACCCATCTAC GTTCGTATAA ATCTTTTGGA 2640 

CGTCTTCGTC GTCTTCAAGA ACGCTGTAAA GTTTTTCAAA GGTTTCAAGG TCTTCGCCTG 2700 

ACAATTCCAC TTCTGACTGA GGAATCATTT CCAATTCAGT CACTTGGAAT TCTTCAATAC 2760 

CAGACTCACG GAGGGCAACG ATAGCCTTGT GAAGGTCAGT TGGCGCTGTG TAAACTGTGA 2820 

TTGTACCTTC TTGTGCTTCT ACGTCATCCA CATCCACATC CGCTTCGAGC AATTGCTCAA 2880 

AGACTGCGTC CGCATCTTCA CCTCCAAATA CAATAACACC TTTGTTGTCA AAGAGGTAAG 2940 

AAACAGAACC TGAAGCGCCC ATGTTTCCGC CGTTTTTACC AAAGGCTGCA CGGACATTGG 3000 

CTGCTGTACG GTTGACGTTA GAAGTCAAAG TATCCACAAT TAGCATAGAG CCATTTGGCC 3060 

CAAAACCTTC GTAACGTCCT TCTGTAAAGG TTTCGTCTGT GTTTCCTTTG GCTTTATCAA 3120 

TCGCTTTATC GATAATGTGT TTTGGCACTT GGGCTTGTTT AGCACGGTCG ATAACGAATT 3180 

TCAAAGCTGA GTTTGATTCT GGATCTGGAT CACCTTTTTT AGCTGCTACA TAGATTTCTA 3240 

CACCAAATTT TGCATATACT TTAGAGTTAG CTCCATCTTT AGCCGTTTTC TTGGCTACGA 3300 

TATTGGCCCA TTTACGTCCC ATTAGGAATC TCCTTTTTTC ACATTTTAAT CTTTCTTATT 3360 

ATAACACAAG TTTTTTTGAT TTTCACTAGA GGAAATGGAT TTTATTAGCA AATCAAGCTA 3420 

GGATAGCACT TTACCTGCTA AGATGGTCTT GCCTTTCTAT CTTTATCAAC AGGCACTCAT 3480 

CCACATTCAA AAAACAAACT AGACCATTAT CTGCAAATAG AAAGTTTCAG CCAAGTTTGA 3540 

CAAAGTCAGC TCAAATTACT GTTTGAAGTT TGTAGATATA AGCGACAAAA ACAATCATAC 3600 

TGCACCTTTT GTTGACAGTC TACTCCAGAC ATATCATAGT TCAAGTAAAT ACTTTGAAAT 3660 
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TCAACAGTTC TTATAGGCGC TATTGTATTC 
ACCTCTAATA CTCAATAAAA ATCAAAGAGC 
AACACTGTTT TGAGGTTGCG GATGGGGCTG 
AATTTACGTG TTCCCAAGAT GGAGAAGTTA 
CTAGCAATTG ATTTGTTCAT ATTTAATTTC 
ACAGCAAAAT ATTTCCGATA CGTGTCGTTC 
GGATAATCAA TCCCCTGTAT ATCAAGGAAT 
TGTTTGATAG ATTCATTTTA ACATCACGAG 
ATAAACTTTC AGATATCCGC AGAGAGATCA 
TCCTAGTCAT TTTCTACCTT ATCTTCTACC 
ATCGTCCGCT TACGCACTAG TGGCAAATCG 
CAGGCAAGCC CGGTACACTC TCTAATTTTG 
GGAATACTAG TGGTAAAGTG AGCCGTTAAA 
GTCAAGACTT CCTTACCTTG ATGATCATAG 
GCAACATAGG CACCACTATG ATCCAGCAGT 
GTCTCAACAT AGTCTGTACT ATTTTGAAAG 
GTTGTATAGG AAATCGGCAA GCCTGGATGA 
AAGTCCTCTA CCATATCCAC CTTGCCTGTT 
CCTAAAATAA CCGCCTTCAC TTCTGTATTG 
GCTACCTTGA CTCCTTTTAT CAAAGCTTCA 
GTGGTTTCCA ACTTGAGATA GACTTGGCGC 
GGACGCTCTG CAGAAATTCC TCTCTGTTTT 
ACATCTCCTG GATTTTTAAC AGCATCTACG 
ACAATCTGAA TCTGCTTTTC GCCTGAATGG 
CCTGTCTTTT CAAAGTCAGA ACCAAACTTG 
ATTTTTTCAT ACTGCATTCT AGCTGGGACA 
TTAGCCAACA AATCGTTTAC CGCTCCGCGA 
AGAAAGCTAT CGCTACTTGC CAAACCAGGC 
TCGACCGCAA GAAGAGTGGG ATTATTCTCT 
CCAGGATAGA GGCGACTGTC GTTGGTAGCT 
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TAAGAAATCA ATAGAAGAGT 
AAACTAGAAA GCTAGCCTCA 
ACATGGTTTG AAGAGATTTT 
GACTAGTACA CTGGCACTTC 
ATTTTTTCCA TAAATGGGTA 
TTGAATTTCC AATCATCTAA 
TGGCTACCCT TTTTACTTTT 
CATACTCCAA TGGAAATCGC 
TCGCCTCTTT TTGTCGCAAG 
TGAGGATAGA GAGTTGTTCC 
GTTTTTTCAT AAACCGTACG 
ACAGAGAGAT TACGAACATT 
TCCTGCCCAT TTCTGTCCCA 
GATAATTCAT TCCAAGTAAT 
AAATCTCCGT TTCTGTAAGC 
GTCGCAACTA CATTGTCACG 
TCTGCTGTAA AGCGACTGCC 
ACAACTCGGG CACCCGAACT 
TCCAAAATCT GTTTCCACTC 
AAAGCAGCCT CTACTTCATC 
CCATAAGCAA CACTCGAAAT 
AAATCCTCTA CCGTTACAGT 
CTGACTGTAT AATAAATCTG 
ACAGAGTTAA AATCAATATC 
ACCTTGAGTT GTTCCATGCT 
TTATTGACCT GACCATAATC 
ACACTTGAAT TGCTGGGGTC 
AAATCAATAC TATAAGTCAT 
AACAAGGTCT CATCCACTAC 
GTTACAGAAA TATCACTTGT 



TTCTAAGCAA 3720 

GGTTGCTCAA 3780 

CGAAGAGTAT 3840 

TAAAACATTG 3900 

TTAGATATAA 3960 

AACAAGTAAA 4020 

TTACACATTC 4080 

TAGGCAAGAG 4140 

CATTCTCCTC 4200 

CCAAATAGAA 4260 

CCACCATTCC 4320 

CCCTTTTAAA 4380 

AGCCTTAGGA 4440 

ATAATATTGG 4500 

TGTAACCTTA 4560 

TAAAAAAGAA 4620 

TTCTTGAATC 4680 

TGGGTCGCCC 4740 

TGTCTGAGGA 4800 

ACTCTTACTC 4860 

ATAGACCAAA 4920 

ATCTTGAAAC 4980 

CTTAAAATTA 5040 

AAGAGAATTC 5100 

GTGAGCCGTG 5160 

TTGATGCCAC 5220 

TTCCACTTGG 5280 

CGGAGCACGA 5340 

GAGAAGTGCT 5400 

ATTTGTCGAC 5460 
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AAGCTCCGCT TCTTTCTTTC GATAACAACA AACTCATCGG GTAGCTGATT ACCCTCTTTG 5520 

ATGAAACGAT TTTCAATACT TTCTCCCTGA TGGGTCAAGA GTTTCTTTTT ATCGTAATTC 5580 

ATAGCTAGTA TAAAGTCATT TACTGCTTTA TTTGCCATCT TCTACCTCCT AATAAGTTCC 5640 

TGGATTGAGT TGCATAAACT CAGACTTGTT CAGCGAAATC AGCCGTGGTT GGACTAAGTA 5700 

ATCCAAAATT TCCTCGTACA ATTCTTCTGA GACATTGCGT CGCCGTCTGG CTAAATAAGA 5760 

AGTCGGAATG ACCGTATTAT CCAACATAAA TACCTTATCT AAGTCAATCA AGGTTGGTCT 5820 

TGTAAAAGGA TTACGAGCTA GATCCGGCTC TTCTATCATA AAGTTCTTGA CCAAACGTCT 5880 

GGTCAAGAGA GCTGGTTTGA AGGTCTGATT TTTAACCAAC TCTTTGTTTT TAGTCATGCT 5940 

GTTGTCAATA CAGATATACA TATGATTCTT CACAGCCAAA TCGCTACTAA TAGTCGGAAA 60O0 

AGGCAAATAA AGAGCTACAA CATCTCCTCT CTTAATCAAG CAAGAGCACC CCCTTTTCTC 6060 

CTAATGTAAC ATAGACAGGA TTGACCAAGT CTTCTGATTG ACTCAGAATT TCCAAAGTTT 6120 

GAGTTTGGCG CGCTGTCAAT TTAGTAGCAT CTTGTCTCTT CAATACAAAA TGCTTGTCGC 6180 

CAATAACCTT GACAATATAA TCCTTCTCCA AAGCTGACTG GTAAATCCAC ATCAGATGTT 6240 

GTCTGTCCTG AGAACTCAAG AGAGAAGGAT TTTCAAGCCT CCCGATAGTC TGATAAAAAT 6300 

CAAAAACAGG AGCTAACTCC TGCCAATCTG ATTGGCTAGT TGTCAAGGCT AGAAAAAGGG 6360 

CTTTGCGAGC TGATACTTCT TGGTTAGCCT TGAGAGTTAC TTTCCCCTCC AAGTTTTTTA 6420 

GAAATCGGGA AACTCCAGAA AGCAAATTTT TCTCTAACTG CGAGAAATAA AAACCTTTCG 6480 

TTCCCAGACA TAAGTCTTTC ATGTCGCTTT CTCTAGCAAA TAAGAGCTCA AACATTTGAT 6540 

AGTAAAAGAA AAATATCTGG CACTGGGTCG CGCTCATCTT TTCCTTATCG GCTTCTTTTT 6600 

TTAACCAGAG CAAGGGCGAC AGGTAGCTGG ATTGAGACAT TTCCTCTACC TCCTACTCTT 6660 

TTTTAACTGG AGCATCTGCA CTAGCTGCCA CTTCTTTTGA CTGGATACTT TCCCACTGGT 6720 

TAATCTCCTC TGAGATAAGA CCTTCGCATG TCTTGACAAA TAGGGCAAAA GCCTTGGTCT 6780 

TTCCTGCATA TTTCTCCGTT TGGCATTGAT AGAGGAATTT TTCTTTCTCC AGGAGTTGCG 6840 

CAGTTTTTTG GTAAGAAATC CAATTTTCCT TTGCATTATA CAAATTGATA ATCCCCTCAC 6900 

ACAGCAAGCC GAGACTGGAT AAGGCAACCG AAATCAAACG GTAGCGATCA CCTGGCATAG 6960 

GAATAGCACA AAAGACAGCT ATGAGGAAAC CTGCCACGAT TTCTGTTATT TTTAATACCT 7020 

TATAGCGCCT ACGATGTTGA ACGCTTTTCT TTAAAAAATG AGCTATCTGT ACGTCTAATC 7080 

GCTCTGTCAG GTACATTTCT TCTGGCGTCA TATTCGTAAC TCCTTTCATT TACTTTGATA 7140 

ATCAGGG 7147 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 755 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



<xi> SEQUENCE DESCRIPTION : SEQ ID NO: 24: 



CCGCATGGGA 


TTGGTGTCCT 


TTTGGGCAAT 


CTCTTTGACC 


AAACTGGAAA CATGTTTTAT 


60 


GCGCCTGCCT 


TTACTGCCCT 


TGTCGGCGGT 


ACGTCTATAT 


GATCCTAGTC 


GCAAAAGTTC 


120 


CGCGCTTTGG 


AGCCATTACC 


ACTATCGGCC 


TTGTCATTGC 


CCTCTTTTTC 


TTGGGAACTA 


180 


AACACGGTGC 


TGGTTCCTTC 


CTTCCTGGAA 


TTATCTGTGG 


CCTCCTAGCA 


GATGGAGTAG 


240 


CTCATTTAGG 


AAAATACAAG 


GACAAAACAA 


AGAACTTCCT 


TTCTTTCATT 


ATTTTCGCCT 


300 


TTAGTACAAC 


AGGACCAATC 


TTGCTTATGT 


GGATTGCGCC 


CAAAGCCTAT 


ATGGCTACTC 


360 


TTCTGGCAAG 


AGGAAAATCC 


CAAGAATATA 


TCGACCGTAT 


CATGGTCGCT 


CCAAACCCTG 


420 


GAACTGTCCT 


TCTATTTATC 


GCAAGTATTG 


TCATCGGAGC 


CCTAGTGGGT 


GCCTTGATTG 


480 


GACAAGCCTT 


GAGTAAAAAA 


TTTGCCCAGA 


AAATCTGATC 


AGTTAAAAAG 


AGCCACGCGG 


540 


CTCTTTTTTA 


TTTATGGCTC 


AATTTCTTAG 


TCAAGAAATC 


TCCCAAGAAT 


TGGATTGCAA 


600 


AGATAATCAA 


AATGATAATA 


ATGGTTGCCA 


AGATGGTCAC 


ATCGTGATTG 


TAGCGGTTAA 


660 


ATCCATAAGC 


GATGGCTACG 


TTACCGATAC 


CACCAGCTCC 


AACCGCACCG 


GCCATAGCTG 


720 


TTtcCCAACA 


AGGGaAtCAA 


GGTcACAGTC 


GTCAC 






755 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3010 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

TTCAATTGGT ATCTCAATCA ACGGTCTTCA CATGGTTTCA ACTGGTTTGA CTCTTGAAAA 60 

AGCGAAAGCT GCTGGTTACA ACGCAACTGA AACAGGCTTT AACGATCTTC AAAAACCAGA 120 

ATTCATGAAA CATGACAACC ATGAAGTAGC AATTAAGATT GTCTTTGACA AAGATAGCCG 180 

TGAAATTCTT GGTGCCCAAA TGGTTTCACA TGATATTGCA ATTAGCATGG GAATCCACAT 240 

GTTCTCACTT GCTATCCAAG AGCATGTGAC AATTGATAAA TTGGCATTGA CAGACCTCTT 300 
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CTTCTTGCCA CACTTCAACA AACCATACAA 
AAATTAAAAA TGAATGAGCT ATCTGGCCTT 
TGTCCCCATA CAATTATAGT TTTTTTATCT 
AAAGGTAGCT ACCAATACAA ATGATGAGGA 
TAAATAAAAA CTTGGCACAG ATGCTCAAGG 
AACAGGCTCG TATCGCAGAA GCTGCTGGTG 
CGGCTGATAT TCGTGCAGCT GGAGGAGTTT 
AAATCCAAGA AGCGGTTAGT ATTCCAGTAA 
AAGCTCAGAT TTTAGAGGCT ATTGAAATTG 
CAGCTGATGA CCGTTTCCAT GTGGACAAGA 
CTAAGGATTT GGGTGAAGCC TTGCGTCGTA 
AAGGAGAACC AGGGACAGGG GATATCGTCC 
AGGAAATTCG CCGCATTCAA AACTTACGTG 
TGCAAGTCCC TGTAGAATTG GTCCAATATG 
ATTTCGCTGC TGGAGGTGTT GCAACGCCAG 
CAGAGGGGGT CTTTGTCGGT TCAGGTATTT 
GTGCCATTGT TAAGGCTGTG ACTAACTTCC 
AAGATTTAGG AGAAGCCATG GTTGGTATTA 
AACGAGGAAA ATAGATGAAA ATCGGAATAT 
CAAAAGTGCT AGATCAATTA GGTGTCGAGA 
AGCAAGATCA GAGTGACTTG TCGGGTTTGA 
GCAAGCTCTT ACGTGACCAG AACATGCTAC 
TACCAGTGTT TGGGACCTGT GCGGGCTTAA 
AAGAGAGTCA TCTAGGAACT ATGGATATGG 
TAGGAAGTTT CTACACGGAA GCAGAATGTA 
TCCGTGGTCC GATTATCAGT AGTGTTGGTG 
ATCAAATTGT TGCAGCCCAA GAAAAAAATA 
CTGATGATGT GCGCTTGCAC CAGTACTTTA 
GAATTTCTCA ACTTTTTTAC ATGTAATAAA 
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CTACATCACA ATGGCTGCCC 
AAGTTAAGGT CAGATAGTTT 
TGTGCTTCAT TCTGTTCTGA 
TAAAACAAAT GACTGAAAAT 
GTGGTGTTAT TATGGATGTG 
CGGCAGCTGT GATGGCCTTG 
CCCGCATGAG CGACCCAAAG 
TGGCTAAGGT CAGAATCGGG 
ATTATATCGA CGAGAGTGAA 
AAGAATTCCA AGTTCCTTTT 
TCGCTGAAGG TGCTTCCATG 
AAGCTGTTCG TCATATGCGT 
AGGACGAGCT TTATGTTGCT 
TTCATGAACA TGGAAAATTG 
CAGATGCTGC GTTAATGATG 
TCAAGTCAGG AGATCCTGTT 
GTAATCCTCA AATCCTAGCT 
ATGAAAATGA AATCCAAATT 
TGGCCTTGCA AGGGGCCTTT 
GTGTAGAACT CAGAAATCTA 
TTTTGCCTGG TGGTGAGTCT 
TTCCCATCCG AGAAGCCATT 
TTTTGCTGGC TAAGGAAATC 
TGGTCGAGCG TAATGCTTAT 
AGGGAGTTGG CAAGATTCCA 
AGGGTGTAGA AATTTTAGCA 
TGTTGGTAAG TTCTTTTCAT 
TCAATATGTG TAAAGAAAAA 
CAATAGCGAT GTATTGAAGT 



TTACGGCTGA 360 

TTAGCTAATT 420 

CTTAAAATGA 480 

CGTTATGAAC 540 

CAGAATCCTG 600 

GAACGAATTC 660 

ATGATTAAGG 720 

CATTTTGTTG 780 

GTTCTATCTC 840 

GTCTGTGGTG 900 

ATTCGTACCA 960 

ATGATGAATC 1020 

GCCAAGGATT 1080 

CCAGTTGTAA 1140 

CAATTAGGGG 1200 

AAACGAGCGA 1260 

CAAATCTCTG 1320 

CTCATGGCTG 1380 

GCAGAACATG 1440 

GATGATTTTC 1500 

ACAACCATGG 1560 

CTATCTGGCT 1620 

ACrrCTCAGA 1680 

GGGCGCCAAT 1740 

ATGACCTTTA 1800 

ACAGTGAACA 1860 

CCAGAATTGA 1920 

AGTTGAGATT 1980 

GCGGACGCAG 2040 
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CTAGGATAAA GAGATGCCAA ATCATGTGGA AATAAGGTTT TTTCTTGGCA TAAAATCCAG 2100 

CTCCAACTGT ATAACAGAGT CCGCCAGTTA CCATGAGACT CCAGAAAACG GGTGTCGTTT 2160 

GACTGATAAT GGCAGGAATG ATAGCCAGAA CCAACCAGCC CATAATCAGG TAAAGAGCAA 2220 

GGCTAAATTT CTCATTGACC TTTTTAGCAA AGATTTTATA GAGAATACCA AAGATGGTCG 2280 

TTCCCCATTG GATGACAATA ATCAGATAGC CAAACCAGTT ATTCATCAAG GTCAAGACAA 2340 

CGGGCGTGTA TGAGCCGGCA ATGGCAACGT AAATCATAGA ATGGTCAATG ATTCGCAAAA 2400 

CATATTTGTG GGTCGAACCA TAGGCCATAG AGTGATAAAT GGTGGATGAT AGGAACATGA 2460 

GAAAGAGACT GATGACGAAA ATGGAAACGC CGATAGAGGA TAAAAATCCG TGTGCTTCAT 2520 

AACTATAGAT GGATGAAATA GGCAGCAAGA TAAGCATGAT GACTGCACCC ACAGCATGGG 2580 

TCACGCTATT AGCAATCTCC TCTCCAAAAC TGAGTTGTTT GCTGAGTTTA AGACTAGTGT 2640 

TCATTGGATT ACCTCCTCTT GAGTATGATC GATTAAGTCT AGAGTTTGAT GATAGAGTTT 2700 

AACGGTTTGG CAGCTGGTTT GGATAATAGG GTTAGCTGGG TCAATTCCTT GGTTCATGTA 2760 

GTCCACAAAA GCATCGTAGA GTTGGTCTGA ACTTGCTTGA GTTTGTAGAG TATTAAGTGT 2820 

CTGGGCTATT TCTTGAATAG AAAATACAGA CTTGAGGGTT GTGATAGCAA TCAAACGGGC 2880 

AATCTGTTGG CGTTGGTATT TTTTTTTGTC AGGCTTTGTC AGGTAACCAT TTTTCACATA 2940 

ATTGTTGACC ATAGATGCTG TTAGGCCCTT GTCTTTATTA GGAGAGATAG GGGCGCAGAC 3000 

CTGATTGACA 3010 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 15213 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

CATAAATCGG TGCAAATAAC TTAATAGTGA AGTAGCCATT TCTTTCGTAT TTACCTGAGG 60 

CATATTCCCT AGACGAAAGA ATATTATTAT CAATCAAATC ATTGAATGAA CGTAGTCTTT 120 

CAACTTCTTC TACTGTTAGA TTTCTGACAA CATTTGTTGC ATAGACCTTA TTTCCATCAG 180 

GATCAGGATG GTACTCATTT GTAACTTTTC TAAGAAGTTG TTGTTTTTGA TTCGTATCCA 240 

ATTTAAGAAT TGAATTTCCT TCGAGATATT CCAACATATA AACAACGTCA AACATGTTGT 300 

GGACATATTG CTTCAAATCA TCTGCATTAT TAAATCTTGT AGTTGGATCA AGTACTTGTA 360 

ATCGTCGACT TTCTGTACTA TCAGATTTTG AATGTTTCAA GATGGAGTTG ATGGTAATGG 420 
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TCGCATCATC TGGATGGTCT GGTGCTTGTA ATAATCCTTT AGCAAAGAAC TCTGGTCCCA 480 

AGCCACTTCT TCGACCATAT CCTCCAAGAT AAATGTCCTG ATCTGAGTCA TGTGTCATCT 540 

CATGCGTATA AGTAATAGCT CCATCCTTAT CCAACATTCG ATAACCCATA TAATAAACTG 600 

CATCACCTGT AGCATAAGCA CCGTGTTGAT TATGCCCAAC TTTATTTCCA ACAGGTCCAA 660 

AGAAATGTTG CATTGCAGGA TTTGGATTAT CAAAATCTGC CACTTCTGTA GCTTTCCCTA 720 

CGGTATTATC ATCGCCAAAT TTATAAGCAT CGTAAAGCAA AATATTTCTA TAAAGTTTTT 780 

CACGTGCATT GTCGTCTAAA ATACGATACC AATAATCGTA GTGATCTCGC TGACGTTTGG 840 

CTGTTTCACG CGCATTTTCT TCAACAAAAT CATTGAGAGC CTTGCCCGCT TTATGGTCAC 900 

TACTGCGGTA GCGATCATAA GCTCCAAATC CTAGACTAGA CATGGTCGAG ATGACAAATA 960 

CGGATCTCTC TGGCAAGGTC AGGAGAGGCA AGACCATATT GCGGTATTTC CATGTGGCAC 1020 

TCGTGATACG ATCATAAACA CCGATAGAAT ACTTGGTGCC AGCTAACCCT TGCTTCGTTT 1080 

TCACCTCTTC GATAGTGGAT TTTTCTTCGA CAATGTAAGC CTTAGTCTCT GATTTAAACC 1140 

AGTCATTATT GCTTGTATTT GGTAAAAAGA CTTTTCGGTA ATGTTCCAGC GTGCTAAACA 1200 

AATCTGTCGT TCCATGTTGA CTGGCAAGAC TGATACCATA AGTATCGACA TTATTCTTAG 1260 

CTAGAAGATT GTTAAAGCCA GATTTACCCA ACTCAATCAG AGTATCTAAT GGTGAAGCAT 1320 

TCCCCTTACC AAAGAAGTCC AAATGGTACA GAACTAGGTC TTTGACATTC ACCTGACCAT 1380 

AGCTAAAGTT ATACCACCGT TCCAGATAGG TCAAGCCAAG TAGCAAGGCT TCCTTGTTGC 1440 

GTTTGATTTT ATCTACAAGA TAACCTTCAG TGACGGGGTT AGCACTAGCC AGTCCAGCAT 1500 

CCGCTGACAA GAGTTTTTTC AAACTGTCTT CCAGTTGTTG TTTTGTTTTG GCGAACTGGT 1560 

CTTCTAGATA GAGCTCAGTT TGCTTGACGT TTGGAGAAAT ACCCAGCGTC TTTCTGATGG 1620 

CTTCTGAATG ATAGTCAACC TTTTGTAAGT CAGGTAAGAC TTGCTTGATG ATAGAGGTTT 1680 

GGTCATACAG GAATTGGTTT GGCGTATAGA GAAGTCCAGT ATTGCCCAGA CTATATTCTG 1740 

CTAATTTGGC GAAATCATTC TGGTATTTGA GATCCAGCTT CTCAGATAAA TCATCCTTGT 1800 

AGTGAAGCAA GAGTTTGTTT GCAGTCTGTT TGTTAGAAAC AATGTCTGTG ATGACTTGGT 1860 

TGTCCTTCAT CATGACTGCT GACAAGAGTT CTTTTTGATA TAAAAGACTG TTCTCATTGA 1920 

CCAGGTTTCC GTATTTGACG ATGGTTGCCT TGTTGTAGAA AGGTAGCAAT TTTTCAATGT 1980 

TTTTATAAGT CAAGTTGCGC TTAGCTTGAT AATAGGCCAC CTTAGAAAAA TCACTGTCTT 2040 

TTTTGCCACT TGTTGAAAGT GGCTCCACTG TTGGTAAAAT GAGAGGATTG ATTTCTGCTT 2100 

TTTTGCTTGC AATTTGAGAA GCATCTAGCA TTGTTCCTCT TTCTTCAAAG GATTCCTTGC 2160 
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TGACGACCTC ATCCTTGACC AAGGTGACAT TGTAGACTCT GTTGGCCTTG CTGCTGAATG 2220 

TGTCCTTTAC CTTCATTTCG TTATAGTGGT AACCAGTGAT GGCATTTCCG TTGGTTACAT 2280 

TAACATCGCT GAGAACATTG GTCAAACTTC CAGCATGCCT AACATCACCA GAAGTTCGAT 2340 

CCCACAAATT GCCTGCCACT CCAGCGACTC TACCAAAGTG CTTGACATTG TTGATATCAC 2400 

CTTCAGCATA GCTATCTTGG ATCTGTGCAT CTCGGTCTAC TAGGCCTGCA AGTCCACCCA 2460 

CAGTCTGATC TGAAGTATTT GTGTTAGATG AAATGGCTAC TGTCGCTTTT GACTTAGTAA 2520 

GTAAAGCCTT GTCACCTGTC AAATGACCGA CCATACCACC GATATTGTAG GCAGCAGTCG 2580 

TTTCATAAGT GTTGATAATT CTTCCCTTGA AACTGCTCTC TGTGATGCTT GATTGCTCAG 2640 

CCTTAGCCAG CAAACCACCG ATACCACGTT CACCAGCCAG AACACCATCG ACGTGAACTT 2700 

GCTTAATTTT TGTGTTATTC TGAGCTTCAT TTGCCAGTGA ACCGATATCA TCTTTCCCTG 2760 

AAATAGCAAC ATTTTTTAGA CTCAGTTTTT CTACTGTAGC ACCACTCAAG TTTTCAAACA 2820 

GAGGTTTTTT CAAATTATAG ATAGCATAAT TCTTGCCATC TTTTTCACCG ATTAAACGAC 2880 

CAGTAAAGGT GTCCTTGATA TAGGATCTTT CATCAGGACC AAGCTCCACT TCGTTAGCAT 2940 

TCAGGCTGGC CGCTAAATGA TAGGTTCCAG AGGGATTTTG GTTTATAGCT TTGACCAGAT 3000 

TACTAAAGGA AGTAAAGTTT GTTGTTTCTT CTGTTCCCTT CTTAGCTAGA TAGAAGGTAA 3060 

AATTATCTTT ATATCTGCTT TCTATCTCCT GCTGAAGCTT CTCTACTTTT GCTGTGATTT 3120 

TATAAAGGAT TTTATCATTT TTTCTTTCCT CTGATATTGA TGCTACTGGT AGGTATACAT 3180 

CTTTGAATGA AGAAGATTTC ACTTTAACAA AGTAGCTATT TGGATTGCTT GGAACTTGCT 3240 

CTAACGAAAT GTGTTGTTTA TAAGTACCAT TTGACAAACT GTATAACTCT AGGTCGGAAA 3300 

CATTTCTTAA TTCAAGTGTT TTCTCTGGTT CTTCTACCTT TTTATCAGGG TCTAGTTCAT 3360 

TTTCTTGTTT AATTTCTTCG TTTCCATTTG AATTGGATGT GTTTGATTCG GTTGAAACAT 3420 

CCTCAGTTGA ATTTCCGTTT GATGGTTCTG GTTCTGTTTG TCCATTCTCT GATGTTGTAT 3480 

TACCTGAATT TTCTGGTTTT GTTGCAGTTC CGTTTTTTTC TGGTTGATTT GATTCTTCAA 3540 

CTGGTGGTTT TGAATCACTA GGTTTATTGG ATACTTCTCC AGTATTTTCG TTAGCTATTT 3600 

TCCCAGAGTT TGTTTGTGTT TCTTCTGCAG GTTGAACTGG TTTTTCTGTT TCTTGATTTG 3660 

AGGTACCTTC TACTGTGCCT TCATTTGGAT TTACTGGAAC TTCTTCTACA GTTTTTTCTG 3720 

AATTTTCATT TTTAGAGTCA TTATGTTCTG GTTTATTTGA TTCTCCAACT GAGGTTGTCG 3780 

AATCACTAGG ATTACTGGAC ACTTCCCCAG TATTTTTGCT AGATGTATCT GGTGATACTT 3840 

TCTCTGAATT CGTTGTTGAT TCTTCTGCAG GTTGAACTGG ATTTTCTGCT TCTTGAATTG 3900 

AGGTTCCTTC TGTAGTACCT TCATTTGGAT TTACTGGTGT TTCTTCTGTT GGTTTTACTG 3960 
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GAACTTCTTC AGTTTTTTCT GGACCTTGTT CTTTGGTCTT CTCAACCGGA GTTTCAGGTT 4020 

TTACTTGCTC AATATTACCC TTATATTCTG GAAGCGGTGC TACCTGCTCT GGTTCACCTT 4080 

TATCACTTAC CACAGTATCT GGCGACTCTG GTTGAACCTC AGTCTCACCT TTGTCGGTCA 4140 

CAACTGCTTC GGGTAATGTA GGTTGAACTT CTGGTTCGCC TTTGTCACTT ACTACAGCTT 4200 

CGGGCAACTC AGGCTGAATT GCGGGTTCAA CAATAGCTCC AGACTGTACG TCCTTATGTT 42 60 

CTACACCAGT CTCAGGTTGT TCCTTTATAA CTTGAGTTTT TTTAGTACCT TTTTCGACTA 4320 

TTCTTGGACT AGGCGCAGTC GTTGAAGTTG AAACAATTTC TCGCGAAACT TCTTCCTTGT 4380 

TTACAGAGAA TATTCTGACG ATTTCAACTT TCTTACCTAA TTTACCTTCT TGTTTTACTC 4440 

TTACAGTTCC TTCAGCTAAA TCAGGATTTT CTTGAATTTC TTCTTGAAAA TCTATTTTTG 4500 

TCTCCATAGT TTCCTCACGA TATAAGAGTT CAGGTTTGTT CAATTGACCT GATAAAACTT 4560 

CATCCTGTGG ATTTAATGTA TTTACCCCAG TCTTTTCTTT TGGAGAAATC TTCTCCTCTT 4620 

TCTTCGTTTC TAGATTCTTA TGTTCGGCTA ATTGTTCTTG AGAATCTGAA GATTGTTTCT 4680 

CTTCTTTTCT TGGATTGATT AATTCAGTAG AGAAAGGTTT TTCAACTACT TGAACTTCTG 4740 

TCGGCTTAGT TGAAGAAACA GGTGTTTGTT CCTGAATAGC TTGTACTGTT GATGGATGGT 4800 

CTACAAAATT CGGTGTAACA TTATAATCCA CCTTTTGTTG TTTTGTAGGA GTGGCAACTG 4860 

AACTCTTTTG ATTACTTACT TCAGACTCAG AAGTCGTTTT TCCCTCTTTG ATATATCCAA 4920 

TATAAGTGTA ACCTGAAATC TCTTTAGGAA GAGGTAATTT TTCTCCAGAG GTCAATTCAT 4980 

AGTCCGTATT GTAATTTAGC AAAAGATGAT TTTCTAAAGC ATGGACTGAA ACTAAGACAC 5040 

CATTTCCTAT CCCTGCAACC AATACTAAAT GTAATACCGT TTTATTCTTA ACCTTTTTCT 5100 

TGGAAACAGC AAAAATTAAA ATTCCCATAG CAGCTAAGCT AGCACCAGCA ACTAGGGCTT 5160 

GCCTCTCATT CTTGCTTCCA GTATTTGGCA ATTCCGCCAG TTGATTTTGA GAATTTAACT 5220 

TATAAACAAG ATAATAAGTT TCATCATCAT TCTCCACGTA TGTCGGAATA TCATAGACAA 5280 

GCTGCTTCTT TTCTTCTGAT GATAGCTCTG AATCTGCCAC ATATTTATAG TGAACTCCCG 5340 

CAGTTTCTTG AGCATCCACA GATGAACTAG CTAATACAGA CATAAAAAAT AAACTTGAAA 5400 

TCGTTGCAGA TACAAGTCCT ACTGATAATT TTCTAAATGA AAAACGCTCT TGTTTTTCAC 5460 

CAAAATACTT TTCCATTATT CCTCCTTGAA ATAAAATTTA TATATGTTAC AAAGACCTTT 5520 

ATTATATTAG TGTATTATCT ATTATCTATA GAAAAGGCAG TATACCTTAA TTATACTCTT 5580 

AATTTACAAA AAAGTCTTAA AATTGAGATG CGCTTTCATA CTTTGTTTTA TATTATTTGG 5640 

AGGTACAATA ACACCTACCA TGAAATTTAC ACGGTAGGTG TTACTCATAT CACTAATCGT 5700 
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TCTAAAAATG GTTTGAGGCA GTTGAGGAGA ATTCCTTCTA TCCAGCTTCC TTGTGCTGAT 5760 

GAGCGATGGT CTTCCTGCAG GCTTTTTTTT AGAAAATCTC GGACTTGTTC TGGTGCGATT 5820 

TCAAATTCAA AGGCTTTCAT TTTATAGAAA AAGTCGATGA GATGATCTGA CAGGTATTCA 5880 

GTTGAAAAGG GTACTTCACC ACTTTTTCTA TATTCTAATA AGAGTCTAGA AAATCGAGCT 5940 

TTTTCTTCAG GAAGCTCACG AAAATAGGAA TTGAGGATCC AAGTCTGCTT CTGTTTTCTT 6000 

TCAATTGGAT CCTGACTGGC AATTCGTTGG TCTTTTTCCA GCTCTTTTTG GTATTGTTTG 6060 

GCCTTGATAG CTCGTTCTGC TCTATTTTTA CCAAAAAGAA TTTTTTCCCA CTTGCGTTCT 6120 

TCTTGAGTCA GGGTCTCTGT AAAGCCAAAG TAATCTTGAT AAGCACGCTC TGCGGGTCCC 6180 

ATGGCTAGAA CCAGATTGTC TGCATATTGC TTGGCGATTT TATCCCTCTT CTTGCGTTCT 6240 

TTCTCTGCCT GGATACGGAG TTCTTGTTCG TAGTCAATTT TCTCCTTGCC TAGCTTGACA 63O0 

AGGTAGAGTT GGTCATCCGA TTTCCCAAGT AAAAAGGGTT TGATACACTT TTCAAGGACT 6360 

TCTTCCATCC GAGCCTTTTT CTTTGGTTCC GCCTTGGTCC AACTTCCTCC CTGAAAGACT 6420 

TCTAGGAAAA GCTGGTAGTC TCTCTCAGGC GCAAATTGAT TGCCACGATT GGGTTTGAAA 6480 

ACACCTTTTT CCCAGAGCCA TTTTAGAAGT CGCTCGTCAA AGTTACTTTT ATTGACCTTG 6540 

ATTTTTTCCT TTTTCTGAGC TTTTCTGGTT AGATTTTCAA CCTTTCTGAG CAGTTTTTCT 6600 

TCCTCTTCCA ATTGCTGGTC AAGGGACAAT CGATGAAAAT GACGAACACA GTCGCTACCA 6660 

ATTGGAAAGA GGCGTTGGCC TGTGACACCG TTAAAGAGTT CATAAGCGTA TTTGATGGCA 6720 

TTTCCACAGA CACAATTGCT ACGGCCGATA CCGTTAAAAA TAAAGGAAAC TTCATTCCAT 6780 

TCCTTGGTAG CTTGTTCCCA AGTATCCGCT TTCGAAGCCT GTAAAACTGC ATCGTGCAGG 6840 

GATTTTCTAA CTGGAAGTGT CATGAGGTCT CCTTTCTAAT ACTCAATAAA AATCAAAGAG 6900 

CAAACTAGAA AGCTAGCCGC AATCAGCTCA AAACACTGTT TTGAGGTTGT AGATAGAACT 6960 

GACGAAGTCA GCtCAAAACA CTGTTTTGAG GTTGTGGATA GAACTGACGA AGTCAgTAAC 7020 

CATATATACA GCAAGGCGAA GCTGACGTGG TTTGAAGAGA TTTTCAAAGA GTATAAGTTA 7080 

TACTTTTACA ACTTGAACCT CGTCTTTACC GAGTAAAATC AAGTATTTTT CAATATTTTC 7140 

AATCGAATAG GCTCGTGATA AAGCCTCTTC GTATAGAGCT AACTGACCAC GATAGCGGTC 7200 

TACGAGTTGA CTTGGTTCAT CATAGCGGTC TGTCTTGTAG TCGAACAGAA CAATTTTGTT 7260 

TTCGTAAAGC AGATAGCCAT CAAGGATACC ACGGACAACA AAGTCTTCCT GACTCTTTTG 7320 

GTCTCGTTTG AGCATGGAGA AAGGTTGCTC GCGATAAAGA TGGTCGGTAT TAGCAAGAAT 7380 

TTCCTGACCG AGTACTGTGT CAAAGAAAGC AAGAATTTTA TCAAGATTGA TCTTGTCTCT 7440 

GACAGCTTGG CTAGTTTGAA CTTGTTTGAG TGTTTCTGTT AGGCTAGCAA GGGTTAGTTG 7500 
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CTGGCTGAGG TCAATTCTCT GCATGAGTTC GTGAGTAGCA CTACCAATCT CAGCTCCAGT 7560 

TACCTTTTCT TTGGTTGAAA AATCTGGCAA ATCGAAGCTG ATTTTCTTGC CTACTGACTG 7620 

ACCTTGACCA GCAATCTCGA CACCTTCCAT ATCCATAACT GGTTCGTAGA ATTTCTTGAT 7680 

TTGACTTGGG GTTTGAACAC TAGGAAGTTC AATAGCTGCG CGGTGAAGAG TATTATAAAC 7740 

TTCCACCTCC TTCAGCATTT CCAGAGCTTC TTTGATGGTA TCTGACTGAC GATTGTCTGC 7800 

TTGGGAGCT A TCTTGGAGAG GACTCTTGGT TTCCAACTCT CCGATAGCTT CTCTGGTCAA. 7860 

CTGATCTTCG CCAATAAAAC GATAACTAAA GTTGAGCTTG TCCTTAGTAA ACACTTTACT 7920 

GATAGCCCAA AGCCAATCTT GGAAATTCCG TGCTTGCAGT CTAGTATTGC TATTTAGTTT 7980 

CCCATTTTTG GCTGCTGGGT. ATTCCTTGGA TTCCAGCTTT TCACGAGAAC CCTTGCCGAC 8040 

AAGATAGAGC TTTTTCTCAG CCCGCGTCAT AGCAACATAC AGCAAACGCA TCTGCTCAGA 8100 

ATAGCTTGCT AGCTGTAATT CCTCTTCGTT CTGCCTATAG GTCAGACTAG GAATGGAGAG 8160 

TTTGATGGTT TTAGGATAGT GGTCTTCTAC TGCCCCTGTC TCCATCTTGG CAATATATTT 8220 

GACACCAAGA CCATTCTGAC GACTGAGAAT GACTTCTGAC ATAGAGTCTT GCTTGTTGAA 8280 

ATCTTGATCC ATATTGAGGA TAAAGACGTA AGGAAACTCC AGCCCTTTAC TCTTGTGGAT 8340 

GGTCATGAGC TCTACTGCAT CTTTTGGCGG TGCGACGGCC ACGCTTGCCA AATCGTGCTG 8400 

GGCTTCTAAG ACTTGGTCAA TCATACGAAT AAAACGCGAC AAACCTTTGA AATTGCTCTT 8460 

TTCAAATTGA TCAGCACGCA GTGCTAGGGC ATAGAGATTG GCCTGCCTAG CAGGACCATT 8520 

CGGCAAAGCC CCAACATAGT CATAATAAAA ACGGTCGTTG TAAATCTTCC AAATCAAGTC 85 PO 

ATAGAGAGAG TGGGTTTTGG CATACAAGCG CCAAGAAGCT AGGATATCCA TGAATTGCTT 8640 

TAGTTTTTCA GCTAGAGCTG TGTGAATCAA GCCTTTTTGA CTACTTGCCA TTTTTTGTGC 8700 

ATTGACCAGT TTCTCATAGA GATTTTCGTG -Ml-TT^ATCC TCTGCTTTCT GAAGGGACAA 8760 

ACGTGCTAGC TCATCCTCAT CAAAACCAAA CATTGGAGAC TTCATAAGGG CAACCAAGGC 8820 

GTAGTCTTGC AGGGGATTGT GAATGACACG AAGAGTGTCT AGCATGACTT GCACTTCTAG 8880 

GGATTGGAGA TAATTGTTTT GCTCTCCGTC AGTTTTGACA GGAATTCCGT ACTCAGACAG 8940 

GGCGAGGAGA ATCTGGTCAT TACGACTGCG GCTGGAGGTC AGAAGGGCAA TTTCCTTAAA 9000 

GGCAACACCT TTTTCTTGAT GAAGTTTCAG AATCTCCTTG ATAACTAAGC GCATTTCGCC 9060 

TGTTAGTTTC GTTTCTGTTT GACTCTCTTC TTCCTCACCT GTATCGTCCT TGTCGTAGAG 9120 

GAGAAATGCT GCCTTGTTGT CTGGATTGGG AGTCAGTTTG GTATTGGCAA AAACAAGCTG 9180 

GTGCTTGTTA TCATAGTTGA TTTCGCCGAC CTCTTGGTCC ATGAGACGTT CAAAGACATC 9240 
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ATTGGTTGCT GACAGCACTT CTGAACTACT ACGGAAATTT TCCTTGAGGA TAATGAGCCT 9300 

GCCTTCTTGG GGATTTTGCG CATAGCGTTG GAATTTCTCA TTGAAAATCT GCGGGTCTGC 9360 

CTGACGGAAA CGATAGATGG ATTGCTTGAT ATCTCCCACC ATAAAGCGAT TGTGGCCATT 9420 

AGACAACAAT TCCAGCATCC GTTCTTGAAT ATGGTTGGTA TCCTGATACT CATCGACCAT 9480 

GACTTCATGG AAGCGCTCCT GATAAGACTC ACGAACTTGT GGGAAATTCT CTAAAATCTC 9540 

AATGGTGTAA TGGCTGATAT CAGCGAATTC GAAGGCATTT TCCTGTCGTT TTCTCTGACG 9600 

ATAAGCCTCT ACAAAATCGC TCATGAAAGA TTGGAAGGTT TTAGCTAGTT TCCAAGTGTC 9660 

TCCATGATAA CGTTCTTGAT AGTCGAGAAT CGCTATCTGG TCTGATAATT GTCCTAGTTT 9720 

AGCAAACTGG GTCTTTCTCT CTTCGTTGTA GGCATCAGCC AGGGGCTTCA AATCAGCCTA 9780 

CGGCTGGCAT TAGTCAGAGC TCGACCGTTT TTCTCCTTAG AGATGGCGAC AACACGCGCA 9840 

AGCACTGCCT GATAAGCCTG ACTATCGGAC TCCTGATTTA GGGAGCCAAT TTCATCCAGA 9900 

ATTAACTGAA CATTTTCTAA ATAGGCAGCC TTTGCAAACT CCTTGGCATC GTTATCCAGA 9960 

TGGTAACGGA AAAAGCTTTC CAAATCCCAA AGGGCTTGTT TGATTTGCTC GGTCAGTTTT 10020 

TCTTTTTCAC TGGTAAAATC AGCTTTCTCA AATCCTTTGA GGAAAGATTC ACTCAGCCAC 10080 

TTTTGAGGAT TACTGGTGGA TTGGAGGAAG TCATAGATTT TATAGACCTG CTGGCGCAGA 10140 

CCCCGTTCGT CCTTGCCACG CCCAGCAAAG TTTTTCAGCA AATGACTAAA GGTCTCTTTC 10200 

TGTTTACCTT GGTAATGCGC TTCAAAGACC TCATGAAAGA CTTCGTTTTC GAGAATAAGT 10260 

TGCTCGCTTT GGTTTTGTAA AATACGGAAA TTAGGTGCAA TATCAAGCAG ATAACCATGT 10320 

TTGCCAAGGA ATTTTTGTGT GAAAGAATCC ATGGTTCCAA TGGCAGCGTT GGGTAGGTCT 10380 

GCCAACTGGC GACCCAAGTG TTGTTTGAGG TCGACATCAT CTGTTTCTTG GATTTTCTTG 10440 

CTGATTTTTT TCTCTAAACG TTCTTTAAGT TCAGTTGCAG CCTTGACGGT AAAGGTTGAG 10500 

ATAAAGAGTT GAGAAATTTC GACACCACGC GCCAATTGGT CCAGAATGCG CTCTGCCATG 10560 

ACAAAGGTCT TTCCAGAACC AGCCGATGCT GAGACCAGGA TATTCTGGGC AGAAGTGTAG 10620 

ATAGCTTCGA TTTGCTCGGC AGTTTTCTTC TGTTCCTTGC TCGAATTTGC TTCTGCTTCT 10680 

TGCAGTTTTT GAATCTCCTC CTCACTTAAA AAGGGAATAA GCTTCATCGA TTCAACTCCT 10740 

CTCTTATTTT TTCAAGCCAA GCTTGCTTGA GTTTTTCTCC GACCAGACGC TTGCCATCAG 10800 

CTAGGTCCAA CTTTTCTAGG AAACGGGCTT GGCCCAGATG GTAATTGGCT TCAAAGCCTG 10860 

TAATAGCCTG ATGTTGCTGG ACGTATGGGG CAATGCTTCT GCCATTTTCA GTATAAGGAT 10920 

TGATGGCGAA CCGGCCTGCT AAAATCTTCT CAGCAGCTTT CTTGTAAAGA TAGGCATTGT 10980 

AGTCCAGTAG GAGCTGAAAT TCCTCATCTG TCAGTTGATT AGCCTTGTTT TTGTTATAAA 11040 
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ATTCGCCTAA ATAACTGCTT TCTTTTTCCA AGAAGAGCCC TTGGTATTTC ATAGATTTGC 11100 

TGGCTTCTAC CACTGCTCCT GCCAGACTTT TTACCGCCAT CAGAGATTGG ACAGGTTCAG 11160 

CCATTTCCAA GTACATGGCG CCGAAAAAGT TCTGCTCCCC TTCTCTTTTT AGGGCAGCAA 11220 

GATAGGTTGG TAACTGAGAA TTGAGCCCAT TAAAGAAATG AGGAAACTGG AACTGAGTCA 11280 

GACTGGATTT GTAGTCTACT ACTCCTATCG CTCCATTAGC TTTCAAACGG TCAATCCGGT 11340 

CCACCTTGCC TCGTACAAAG ACACTGCGTC CATTGTCTAA TTGAATAAAG GCTTGGTCTT 11400 

TTCCACCAAA ATTTGCTTCT TCTTTGATGG TTTCGATGGC TGGATTGTGT -CGGAGAATAT 11460 

GTCCAGTTGT CCGTGCAACA TCAAGCAAAA CTTCCTTGGT AAACTGGGCT TCCAAACTTT 11520 

CTTGATAAAT AGCTTCAAAT TCGCGTTCTT GACTGGTTTC TTGAATAGCT TGTTCTAGAC 11580 

GTTGGTCAAA GGAATCTTCA TTAGGCAACT GTAAGGCGCG TTCAAAGATA CGATGCAAGA 11640 

AATTCCCGTG ACTACGGGCA TCAGGATGCA AACGTAATTC CTCCTGCAAG CCTAAAACGT 11700 

AGCGTAGGAA ATAACTGTAT TCATTGCGAT AAAACTCTGT CAAACCCGAC GTAGACAGGT 11760 

AAAACTCCTG TTTGGCAGGA TAGAGAGCTT GCAAGGTGTC CTTGGCTAAG GTCTTGCTGC 11820 

TTGGACTGGT TGGGATAGCT GGATTTTCCA GACCTTGCTG ATCTAGTTTT TTACCTATGA 11880 

CACGCGACAG AACCTTGACA AAAGTCAAAT CTTGCTCAGT ATCGCTCATC TCACCCTGCT 11940 

GGTGATAGGC AACCAGACTA GACAAAAGAC TGTGATAGGA CCCCATATCC TCCTTAGACA 12000 

GTCCTTTGTG ATTCATCCTC TTCTCTCTCC GCCTAAATCC AAAATGGATC AACTCTTGAA 12060 

GATAGGCAGA TTCCTTACTT TCACTTTCGT TAAAAAGGCT TGGAGCCGAC AAGAACAACT 12120 

GCTTACGAGC AGAATTGACC AAGGAAAGCA TAGTGTAGCG ATTTTTCTTG AGATTTTCAC 12180 

TGCTGGCAAT CAGTAATTGA ACGCCTTCTT CGGTCGCTTG GTTTAGGTTT TGCCTTTCTT 12240 

CATCTGTCAG AAGACTGGTG TTTTGAGAAA TTTTTGGTAA ATTGTCCTGA GTTAGTCCAA 12300 

TAGCATAGAC AAAGTCAGCA GTCAATGGTG CAATCAAATC GTAACTCTGC ACCAGAACAG 12360 

TGTCCACTGT TGCTGGAATG GTACGGTATT GGGACAAACT CATTCCAGAA TGGAGCAAGG 12420 

CTAGGAAGTC TTCCAGACTA ACCTGTGAAC CAGCAAAAAC AGTCGCAAAT TGTTCTAAAA 12480 

CATGGCAGAA AGCCTTCCAA ACTTCGGCTT GTCTTTCCTG TTCTACAGCT TCCAAAGTGG 12540 

TTGTCAAATC TTGTAACTGC TTGGTCACAG CTCCTTCTTT TAGAAAGACA CTCCATTTTT 12600 

GTAGGAGTTT TTCAGCCTTT TGTTTTCGGC TGGCAAAGAG GGTTTCAAGA GGTGCTAAAA 12660 

TTCTCAGGCG GAGGACATTC AAACGCTCAA GATTAAATTT TCCATGGTGG GATTTGGTGA 12720 

AGGTTTGCTG AAAGGCTGGC AAGCCATTGA TACCAAGATA GCGGATATAT TGCTCAAAAG 12780 
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CATCAATATC AGACTGACTG AGGTCAGTAT ACAAATCAGT TCTAAGAAGA TTAATCAAAT 12840 

CCTCCTGACG AAAACGGTAA CGTTTTAAAG CTAAAATAGA CTCGACAAAC TGAGTCAAGG 12900 

GATGATGAGC CATGGCTTCG CTTCTACCAA GATAAAAAGG AATCTGATAC TGGTCAAAAA 12960 

TGGTTTTGAG AGATAACTGG TAAGAAGCTA CATCCCCCAA GAGAATACGA AAATGCTTGT 13020 

AGCTCAGGTC TGAGTTCTCA TGTAATTTCT GACGAATACT ACGGGCTACT AGCTCCAACT 13080 

CCTCCTTTTG CGTCAAACAA GACCAGATTT GTAAATTTTC ACGGTCTTTC TCATCGACAT 13140 

CCAAAGCGAG TTCTGAAAAG TCATAAGAAG ACTCCAACAA ACGAGAGGCC TTGTCAAAAC 13200 

TATCCATCTT CTCATGAGTT TGAGAACAGT CCTGAGCAGG CGTTTGGTAT TTAGAAGCCA 13260 

GATGATGGAG AAATTTTACG CTGGCTTGGT AGAGATTGCC CTCGCTAAAA GGACTGGTAT 13320 

AGGCTTTCTT ACTAGCATAA GCCCCGATAA CAATCTCAAC ACCTTTGCCG TGAAGTAAGT 13380 

CCACAACCCG CTCTTCCTCA GCAGAAAAAC GAGTAAAGCC GTCAATGACC AAGGCGATTT 13440 

GATTAAAATC ACTACTTACC TTGTCATTCT CAATAGCCTC AATCAAATGG GACAACTGAC 13500 

TTTCCTGGGC TAACTGACCT TGATTAAGAT AGGCTGTTAC TTTCTCAAAA ATCAAGAGTA 13560 

AATCCGCCCT CTTATCCTCA TCTGTTAAAT TCTCCAAGTC CAAAAAACTC ATCTGAGATT 13620 

TGGTCATCTC ATGGTAAAGC TCAATTAACT GCTGGATCAA TTGAGGATCC TGCTTAATAG 13680 

CGCCATAAAC ACGCAAGTCC TTGGGATCGA GTTCGGCAAG GCATTTGTAA AAGGCCAACC 13740 

CAAGACCGAT ATCATCAAGA GTAGTTTTAG CTGGTAAATC ATTCAAGACC AGATAGCGAG 13800 

CCATTTGAGC AAAGCGCGTG ACGGTAATCG AAAAAGAAGC CTGCTGGGAC AAGTATTCCA 13860 

GCACGGCGCG TTCCTTTTCA AAAGAAAGAG AGTTGGGGGC AATGTAGAAG ACCCGCTTGC 13920 

CAGCTGCAAC TAGCTCTTCT GCCTCTCTTG TTAGAATTTC TGTCAAAGAA GTCCGAATAT 13980 

CAGTATAAAG TAATTTCATC TCAGCCTCGT TGGAATTTTT CATCACCCTA TATTATACCA 14040 

TGATTAGCCT CGTAAATCTG TTAAAATATT TAGGCCATCC TTTCTTTTCT TCATCATCTG 14100 

CTAAATCTTA AATACTTAGC TTTACTTGTA TTAGATAGAA TAAGTCTGGC TACTGAAAAT 14160 

CACATAATAA AAAAGCCTCG GTAACAAGGC TTTGAGTTTT ATGATTGTTT CTTAGGTACG 14220 

GAATACACTT CAATGTGTTG TCCCAGTATC TTAATGTCGA CTGGTAGATT GTCTGATTTA 14280 

TCGCCATCAA CATCGGACTC TAATTCGATA TCAGAAGAAG TTTTAATATT ACGTGCCTTT 14340 

ATATATTCAA TATTCTTGAT AGAATGATTG AACTATAGTA AATTGAAACT ATAATAGTAC 14400 

ACCGTGGATG CTAAAATATT TCTAGAAATT AATTTGATTT CCCTAATCAA GCTATTCGTA 14460 

TCTTATTTCA ATCTACTATA ATAAAATGAA CCAAAAATAG TACACAATGT GGTATAATCT 14520 

TCTTATGGCA TATTCAATAG ATTTTCGTAA AAAAGTTCTC TCTTATTGTG AGCGAACAGG 14580 
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TAGTATAACA GAAGCATCAC ACGTTTTCCA AATCTCACGT AATACCATTT ATGGCTGGTT 14640 

AAAGCTAAAA GAGAAAACAG GAGAGCTAAA CCACCAAGTA AAAGGAACAA AACCAAGAAA 14700 

AGTTGATAGA GATAGACTTA AAAACTATCT TACTGACAAT CCAGATGCTT ATTTGACTGA 14760 

AATAGCTTCT GACTTTGGCT GTCATCCAAC TACCATCCAC TATGCGCTCA AAGCTATGGG 14820 

CTACACTCGA AAAAAAGAAC CACACCTACT ATGAACAAGA CCCAGAAAAA GTAGCCTTAT 14880 

TTCTTAAGAA TTTTAATAGT TTAAAGCACC TAGCACCTGT TTAGATTGAC GAAACAGGAT 14940 

TCGATACTTA TTTTTATCGA GAATATGGTC GCTCATTAAA AGGTCAGTTA ATAAGAGGCA 15000 

AAGTATCTGG AAGAAGATAT CAGAGGATTT CTTTGGTTGC AGGTCTAACA AATGGTGAAT 15060 

TAATCGCTCC AATGACTTAC GAAGAGACGA TGACGAGCGA CTTTTTTGAA GCTTGGTTTC 15120 

AGAAGTTTCT CTTACCAACA TTAACCACAC CATCGGTTAT TATAGTAAAA TGAAATAAGA 15180 

ATAGGGGGGG GGGGGGAGGG GGGGGGAGGG AGA 15213 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6004 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
{D) TOPOLOGY: linear 



(xii SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

TTATTACCTG AAACATTAAA TTTAATTGGA CATCCCGTTA TCAATTTTAT AATATCATCA 60 

AGATTTTTAT TATCTGATTC AGGAATTTTA TCTGATATAA CAACACCATT TTCAAGATAG 120 

TTCATTAAAT TATTTGATTC ACTAACATTA GTGTTTTGAT CTCCATCAAG CCAAAAATAA 180 

TGGTTATCGG AATCTAAATA CGATGAGTTT AAAATATTAT TACAAATTAT TTGATTTGCT 240 

CCACCAGGAA TATATCTCAC TACTAAATTC TGTTTAAGAT TCTCACTACC TGAATGAGTG 300 

ATAACAAACT CTAGAATATA TTTAGCTAGT CTATCTTCAA CATAAATCAT CTTCCTAGAA 360 

TGATACACAT CACCTAATTC AAAAAATGCA TCCTGATAAT CAATATTTTC AATAACATCT 420 

ACCTTTTCTC CGTTTTTCAC TAAAAGTTTC ACGGCTTCTC TAGGAAAATC TTTTATAAGT 480 

TGTGTAGAAT GTGTAGTGAT AATAATTTGA TGTTTTTTAT TTAAACACTC TTGAAGTAAA 540 

AACTCTTTAA ATTTATAGAT TGCACTCGGA TGAAGTGAGA TTTCAGGTTC ATCTATTAAT 600 

ATTAATGAAT TTGATTGCGC ATTTACTATA TCATTTACTA ACAAAATAAT TCTAGCCTCA 660 

CCTGTTCCTG CAAAAGCCTC GGAATATTCT TTTCCAGATT TTTTCATCCA AATAGTTTTG 720 
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GAAGCTTTTA TATCATCACC TTTTGAATAC AACTTATGTG TTAAAATTTG AATGTCTGTA 780 

TAAGATTCAT CCATTATTTC ACTAATAATT TCACAAACTT TATCATCAAC TTTAACATTA 840 

TCTATAACCA TTTCCTTTTT ATAACGCGTA TAGCTACTTG TATTATTCTT TAAAATATCA 900 

GCAACTGGCT TAGATCGTAA TCTTATAAAA TCTTGTTTAC TACGTTGAGT AGAAATTTTT 960 

TTAAAATTAT AGTGATAGAA AAATAAATCA AAAGCAGAAA CATATTCTTT ACAATCACAA 1020 

AAGACAACAT TTTTTTCAAT GCCATCCCAT CTGTCTGTCG AAGAACTTCC AATATATTTA 1080 

TTTTTGGGTA ATCTTTCCAT CTCATATTGT TTTTGAGGAG CATATGGTTC CCAATAATCT 1140 

AATCCTTTTT TTGTTCCAGA ACGGCCTTTA AGAACTTCTA CATTTCTAGA AGCTTTAATG 1200 

TTATAATATG AATAGATTAA ACATTGTTTC CCATCCACTT CATCTATTTG ATCAACATTT 1260 

GTACTAAACC AATATTCAGA CACACTTTTA TTGGCTGGAG AACCATATAA AGCTTGTAAA 1320 

ATTGAAGTTT TATTTACTCC ATATCTATTA CAGACACCTC AGGATTATTT AACTTATAAG 1380 

TTTTAACAGC TACGGAATCA ATTTCAACAG CAACTTGAAC ATCTATGCCT GATTTTTTAA 1440 

GGCCACTTGT AGTGCCACCT GCACCGTTAA ATAAATCAAT AGCAACAATT TTCCCCATAG 1500 

TATTCTCCTA AAGTTTCTCC TTTTTATTAT AACATTATCA AATGTAAAAC CCAACCCGAT 1560 

AGGGTTAGGT TTTTAACATC ATTTCACCAA CTTCTTCATC TCATCAATAC GTGCGACGGT 1620 

CGCGTCATAT TTAGCTTGGT AGTCAGCTTG TTTGTCGCAT TCTTTTTGGA CGACTTCTGG 1680 

TTTGGCGTTG GCTACGAAGC GTTCGTTAGA GAGTTTCTTA CCAACCATGT CCAGTTCTTT 1740 

TTGCCATTTA GCAAGTTCCT TGTCGAGACG GGCCAGTTCT TCTTCAACAT TGAGGAGATC 1800 

GGCCAGTGGC AGGTAGATTT CTGCTCCTGT GATGACACTT GACATAGCCA GTTCAGGTGC 1860 

AGGGATGGTT GATGCGATTT CCAAGTGTTC TGGATTTGTA AAGCGTTTGA TATAGTTGAC 1920 

ATTGCTGTTA AAGAAGGCTT CCAAGTCGCT ATCGCTTGTC TTAACAAGGA TGGTGATAGG 1980 

CTTGCTTGGT GCTACATTTA CTTCCGCACG CGCATTCCGA ACAGCACGAA TCAAGTCTTT 2040 

GAGACTTTCC ACACCAGTGT GAGCCGCAAG GTCTTCAAAG GCTAGATTAA CAGTTGGGTA 2100 

TGCAGCTGTC ACGATAGAAC CTTCTGAGAT TTGTCCAAAG ATTTCCTCTG .TCACGAATGG 2160 

CATGATTGGG TGAAGGAGAC GAAGGATCTT GTCCAGCGTA TAGAGGAGAA CAGATCGAGT 2220 

AATGACCTTA TCGTCTTCAT TGTCGCTGTA TAGAACTTCC TTGGTCAACT CAACATACCA 2280 

GTTGGCAAAT TCTTCCCAGA TGAAGTTGTA AAGGATATGA CCAGCCACAC CAAACTCGAA 2340 

CTTATCAAAG TTTTCAGTAA CTTTTGCAAT GGTTTCGTTG AGATTGTGGA GAATCCAGCG 2400 

GTCCGTCACA TTACCAGCCT CACCTGTTGC AACTTTTGTG ACATTGTCAT GCGCCACATC 2460 

CAGCGTCAAA CCTTCATTGT TCATGAGGAT ATAGCGAGAA ATGTTCCAAA TTTTGTTAAT 2520 
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AAAGTTCCAT GAAGCATCCA TTTTCTCGTA AGAGAAACGA ACGTCTTGAC CTGGTGCGGA 2580 

ACCGTTTGAA AGGAACCAAC GAAGGGCATC AGCACCGTAT TTCTCGATGA CATCCATTGG 2640 

GTCAATCCCG TTACCGAGAG ATTTAGACAT CTTGCGTCCT TGCTCGTCAC GGATGAGACC 2700 

GTGGATAAGC ACGTTTTGGA ATGGCTGACG ACCAGTAAAT TCCAAGGACT GGAAGATCAT 2760 

ACGAGACACC CAGAAGAAGA TGATGTCGTA ACCTGTTACC AAGGTTGAAG TTGGGAAATA 2820 

ACGTTTAAAG TCTTCTGAGT CGACTTCAGG CCAGCCCATG GTTGAAAATG GCCAGAGGGC 2880 

AGAACTGAAC CAAGTATCCA AGACGTCTTC GTCCTGAGTC CATCCGTCAC CTTCTGGAGC 2940 

TTCTTCGCCG ACATACATTT CACCATCAGC ATTGTACCAG GCAGGGATTT GGTGACCCCA 3000 

CCAAAGCTGA CGAGAGATAA CCCAGTCGTG GACATTTTCC ATCCATTGAA GGAAGGTATC 3060 

GTTGAAACGA GGTGGGTAGA ATTCGACCTT GTCCTCTGTG TCTTGGTTAG CAATGGCGTT 3120 

CTTAGCCAAT TGGTCCATCT TGACGAACCA TTGAGTAGAC AAGCGTGGCT CAACTACGAC 3180 

ACCTGTACGT TCTGAGTGAC CAACACTGTG GACACGTTTT TCGATTTTGA CAAGGGCACC 3240 

GATTTCTTCC AACTTAGCAA CGACTGCCTT ACGAGCTTCA AAACGATCCA TGCCTGAAAA 3300 

TTCAAAGGCA AGCTCATTCA TAGTTCCGTC GTCGTTCATG ACGTTGACTT GTGGCAAGTT 3360 

ATGACGTTGG CCAACCAAGA AGTCATTTGG ATCGTGGGCA GGTGTGATTT TCACGACACC 3420 

AGTACCAAGC TCAGGATCTG CGTGCTCATC TCCAACGATT GGGATGAGTT TATTAGCGAT 3480 

TGGAAGGATG ACGTTTTTAC CAATCAAGTC CTTGTAGCGC GGGTCTTCTG GATTAACCGC 3540 

AACCGCAACG TCCCCAAACA TAGTCTCAGG ACGAGTTGTA GCAACTTCAA GGGCGCGTGA 3600 

ACCATCTTCC AGCATGTAAT TCATGTGGTA GAAGGCACCT TCTACATCCT TGTGAATCAC 3660 

CTCAATATCA GAAAGGGCTG TGCGAGCTGC TGGGTCCCAG TTGATGATAA ACTCACCACG 3720 

ATAGATCCAG CCTTTCTTGT AAAGGTTCAC AAAGACCTTA CGAACAGCTT TTGACAAACC 3780 

TTCATCAAGA GTGAAACGCT CACGAGAATA GTCTACAGAA AGCCCCATCT TGCCCCATTG 3840 

TTCCTTGATG GTAGTGGCAT ATTCGTCTTT CCATTCCCAG ACCTTCGTCA AGAAAGACTC 3900 

ACGACCTAGG TCATAACGCG TAATACCCTC ACCACGTAAG CGCTCCTCAA CCTTAGCCTG 3960 

AGTCGCAATA CCAGCGTGGT CCATACCTGG AAGCCAAAGG GTATCAAAGC CTTGCATGCG 4020 

TTTTTGACGG ATGATGATAT CCTGCAAAGT CGTATCCCAA GCGTGACCAA GGTGAAGTTT 4080 

CCCAGTTACG TTTGGTGGTG GAATCACGAT TGAATAAGGC TTAGCCTTTT GATCGCCTGA 4140 

AGGCTTGAAA ACATCCGCAT CAAGCCATTT TTGGTAACGA CCAGCCTCAA CCTCGGCTGG 4200 

ATTGTATTTA GGTGAAAGTT CTTTAGACAT GTGTGTGTCC TTTCTCTATT TTGTTTATTT 4260 
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TATTTTGAAT TTGCTTAGCA GCTTCTTCTG CAGACAAATT CGTATTATTT ATTTTAAAGT 4320 

AGTGGTGCAA CTCATTCGGT TGATGTTGGG AATTTAATTG AAGTGTTTCA GCGGTCTCTA 4380 

AAATTTCTCT TTCAGATACC TCAATATGTC GTTTTAAGGG TTTGTGCTTT AATCGATTCT 4440 

CCGTTCGATT TCGACGTATG CACTCTTCAA GACTTGTTTC CAATTCAACA AACAGAATCT 4500 

CTTGATGAAA GTTATCCAAT AAATCCTGAA TTTGCTTTAA ATACATCAGC TGGTACTGAT 4560 

TTGAAAAATC AATTACGTCT GTTAAAATTA CTGATCGCTG ATTTC TT GCA CTTGCTCCAA 4620 

GGAAAGAAAA GGTAATTCCA CGAACAAATT CCCACATCTC CTCGGTATAA TCCTGATAGA 4680 

TCTCTAGTGC AAAATCAATG GCTTGATGGT TATAAAATAG GGTAGCATCC GTCAGTCGAG 4740 

ATAATTCTTG ACCAATGGTC ATTTTTCCTG ATGCTGGAGC ACCAATGATG AAAAGATGCA 4800 

TCAAATCACC TCCCACTCAC TCCTCAGCAA GCCATATCTC AAATCATCAC AGCAGTTGCC 4860 

TTGAGCATCT TTGCGGTCTC TTATGCGAGC TTCGAGGGTA AAGCCAAGCT TTTCCGAGAC 4920 

TCGTTGACTT TGAAGGTTAT ATCCAAAGCA AGTTAGTTCA ATCTTGTGAA GACCAAGTTC 4980 

TTTAAAAGCT AGATCAATCA AGGAACACGC TGCTTCTGGA ACATAACCTC GACCCCAATA 5040 

GTCTGGGTGC AAGGTATAGC CAAGCTCTAG CACATCATCC GCATGAAGAT GGTTGAAGTC 5100 

AACAGAACCA ATGACTTTAT CGGTTCCTTT GACGACAATC CCATAGCCAG CTGGGAGATT 5160 

TTCCTTTTGA GTACGCTCCG GAAGAATGTG CTCCAGATAA TAAATCTCAT CTTCCAAGAT 5220 

CTTGACTGGA GGAAAACCTG CTGGATAGGC GACCTCTGGC AAACTAGCGT AGGTATGGAT 5280 

ATCCTCAGCA TCCACCACTG TGCGGACTCG TAAAACGAGA CGTTCTGTTT CGATTTTATC 5340 

TGGCAGCTCA GTTCTTGCCA TCCTTCTTCC TCGCTTTTTT GATGAAACTG CCCTTCATAT 5400 

CTACACGCTT GTCCAGATAG CGATAAACGC GCTGATATCC ATCTCCCATG AAATAGGTTG 5460 

GGGCAAACAG TTGATTTTTA AAATGTCCCT TTTCATCCAG GAGTTCTGGG GCAACAAGTC 5520 

GCTCAAGAAT CTTGGCAAAG ATGTGGCAAA TACCGTCTTC CTCAACAATC CTATCTACCC 5580 

GACAATCTAA AACAAGTGGA CAGGCGTCTA AAATAGGAGT CTGAGTTCGT TCAGAAAT IT 5640 

CATAATGCAC TCCCAAACGT TCCAATTTCT CCTGATGACT GATAAAACCA GCCTGCTCCA 5700 

TCGCAAGCAT AGAAGTTTCA TCAGAAATAT TCACAGTAAA TTTTTGATAC TGTTTGATCT 5760 

GCTCTGCGGC ATTCTCTCTC GCAACGACTC CAATCACAAC CCAATCTCCT AGACTATAAG 5820 

AGGAACTACA GGTCGTGATG TTATAGCCAA AATTCTAATC TTGATATCCT AAAATAAAAA 5880 

CAGGAAAACC ATAATATAGT TTACTTGTGT TAAAAGATTG CTTCATAACA ACCCCCTTTG 5940 

ACTAAGACGT AAAAGAAAAG CCCTGCCATC TACATGACAG GGACGAATGT GTTTATCCGC 6000 

GGGG 6004 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5857 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
{D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

TGTAGAATTC ACGACAATGC TTCGTTGATT TCTGGGTTGA TTTCGTCGCG TTCTGGCAAG 60 

CGAGTCAATG AACCAAAAAT AGTACACAAT GTGGTATAAT CCTTTTATGG CATATTCAAT 120 

AGATTTTCGT AAAAAAGTTC TCTCTTATTG TGAGCGAACA GGTAGTATAA CAGAAGCATC 180 

ACACGTTTTC CAAATCTCAC GTAATACCAT TTATGGCTGG TTAAAGCTAA AAGAGAAAAC 240 

AGGAGAGCTA AACCACCAAG TAAAAGGAAC AAAACCAAGA AAAGTTGATA GAGATAGACT 300 

TAAAAACTAT CTTACTGACA ATCCAGATGC TTATTTGACT GAAATAGCTT CTGACTTTGG 360 

CTGTCATCCA ACTACCATCC ACTATGCGCT CAAAGCTATG GGCTACACTC GAAAAAAGAA 420 

CCACACCTAC TATGAACAAG ACCCAGAAAA AGTAGCCTTA TTTCTTAAGA ATTTTAATAG 480 

TTTAAAGCAC CTAACACCTG TTTAGATTGA CGAAACAGGA TTCGATACTT ATTTTTATCG 540 

AGAATATGGT CGCTCATTAA AAGGTCAGTT AATAAGAGGC AAAGTATCTG GAAGAAGATA 600 

TCAGAGGATT TCTTTGGTTG CAGGTCTAAC AAATGGTGAG TTAATCGCTC CAATGACTTA 660 

CGAAGAGACG ATGACGAGCG ACTTTTTTGA AGCTTGGTTT CAGAAGTTTC TCTTACCAAC 720 

ATTAACCACA CCATCGGTTA TTATTATGGA TAATGCAAGA TTCCATAGAA TGGGGAAGCT 780 

AGAACTCTTG TGTGAAGAGT TTGGGTATAA ACTTTTACCT CTTCCTCCCT ACTCACCTGA 840 

GTACAATCCT ATTGAGAAAA CATGGGCTCA TATCAAAAAG CACCTCAAAA AGGTATTACC 900 

AAGTTGCAAT ACCTTTTATG AGGCTTTTTT GTCTTGTTCT TGTTTCAATT GACTATATAA 960 

ATTGTCTAAG CGAAACAACC GATAAGAATT GGCACAAAAG CGACCGTATT TTTGTTACCA 1020 

ATACAGGAAA AACAGTTCAT AGTTCTATCT TGAGCAAGTC TCTCCAGCGA GCAAACGAAC 1080 

GCCTTAAAAA ACCAATTCCC AAACATCTGT CCCCTCACAT CTTCAGACAC AC C ACT ATT A 1140 

GCATCTTATC AGAAAATAAA ATTCCTTTAA AAACAATCAC GGACAGGGTT GGTCATCCCG 1200 

ACTCTGAAGT CACTACTTCC ATCTACACCC ACGTCACAAA GAACATGAAA GATGAAGCAA 1260 

TCAATGTACT GGATAAAGTT ATGAAAAAGA TTTTTTAAAA AGTTTTGTCC CTTTTTTGCC 1320 

CTCTAAATAC AAAAATAGCC CTTCGGATAA AATCCGAGGG GCTAGAAACG TTGTTAAATC 1380 
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AACGGCCGAA CTTTTGAATT TCATGGTTCG GGATAAAATA GTTCACTGAA CTATTTTATT 1440 

TTTTAAGGTT ATCATAATAT CAAATAGTTC AATTAAATAC GCTAAATTAC TAATATACTT 1500 

TTTACCTTTT TCATTCTAAA ATGTAAAGTA CAAACAATTA CAATATACTA GAGGGGGAGT 1560 

AAAAAAGGTA TTAAATCGAT GAGTTCAGCA GGCAAGAAAA TAGCACCTTT ACGGGTGCTA 1620 

TTTTTTAATT AACGCCACGT TAACTTTTGA TTGATGAATT TTATTGTTTG GCACTTCTTT 1680 

CATTTCACGG TAAACATCGA TGAAATTCTT TCCAACATTA TTTTTGGAGT TAACTGCATT 1740 

TATTTTTGTA TTAATAACTT TTTTAGTATC GAAAGAATGG TTTAAGAAAT CCATAACTAA 1800 

CTCTCCTTTC TCATCCTGTA ATCAAGATTT TTATCAATGT CAAAATAGTA TTTTCTATCA 1860 

ATCCAAATTG GTCCTTCTCC TTTAGAAATA GCAAGTACAT CTACCGGACC TCCTACTGTT 1920 

TCAAGAGTGT TGACAATTTT TCTCTTAAAT GAAGTTAATT CAATAAATGT TTTAGCTGTA 1980 

CTCGCCATTT CATTAAGTGG TTGCATTCCA ATAAGGTCTA TTATAGGATT TATATAATAT 2040 

TTTTGCTGTA TAGATGATAT ATTTTCAAAT ATATTCTCAA TTTCATCACC CAATCCATTT 2100 

TTCTCCATAA CTGATGATAC TTGCTCTGCG ATATATACAT TTAAGTTAGG ATCTATACCA 2160 

TTCATAATCG TCTCAACCAT CTCTGACTGT GCAAAAGGGA TTATATGACA AGTTTTATGA 2220 

TGATTTATCA CACTTTCATT AATAACTTTC CAAATTAATC GTTTAGAAAA AATTCCATAT 2280 

AATTCAATTT GTCTTATAGA TGGAAATATC TCGTCTGTAC CATAACCTGC TATAACTAAT 2340 

CCAGTTATGT TTGTTGAGTC ATATCCAATG, AAAATCGCTT TATATAAAGA TTTAGCAATA 2400 

ACTTCAACCT CATCATCAGT ATGAGGAAAG GATTTAAAAA CATCGTCTAC AATGCTTTTT 2460 

ATTAACTCTA ACTCAGCTTC AAAAAATTCA AAATTACTTT CAGCTTCTAC TTTTGAAATT 2520 

TCTAAACTAA AATTAGTTAT AGCATTTAAT AAAATTTTAT TAAAATCATC TAGAGTGATG 2580 

GTTTCACCAT TAGAAACTCT TAAATCAGCT GTTTCTTGCG CTTCATAGGC AATGCTGTCC 2640 

AAAATACTTC TTGTACTTCT GACAATATAA TTTCTTAATA AATCCTCAAC TTGTAGATGT 2700 

TTAAAGGAAA TTAAAAATTC TATTAGCTTT TCAACGTATT GGGCAGTATT ATCTAATAAA 2760 
TCTGTGCCAA TAGCCTGCTT AAACTCATTT AAAATTACCT CCCACGGAAT TTCCATAAAC . . 282 0 

GAAGCGTTCC CATATATCAT GATCCCCACG GAATGTTCTT TTGATAAAGT GAATAATTTT 2880 

CGGGCGCTAT TAAAAACTTT TGAATTTTTC CCGTCTGATA AGGTTACAGC GCTATCAGAA 2940 

GCCAATACAA CACCATTTTT ATTTAATATT CCAATTTCTG CTGTCAAAAT ATCACCTAAA 3000 

CTTTCTAAAC CTGCTCATGC TCTAATGGTA CAACAGCTAA GGTCTTACCA AGACTTGCCA 3060 

ACACTTTTAA TACTGTATCA AGTTGTGGGC TTGTCTTTCC TGTTTCCATT CTAGCGATAA 3120 

CTGGCTGACT AACACCGCTC ATCTCCTCTA GTTTCTTCTG ACTAATACCC TTTTCATTTC 3180 
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TAGCCTCGAT AAGCTCACTC ATGATAGCCA CGCGCATATC ACTTTCCAAA ATTTCCTCTT 3240 

TGCTGAATAA TTCAGCTCTT ACATCTTTCC AGTTACTACC AATAGCATTA TTTTTCATTG 3300 

TCTAAACCTC TTTCTTTTAA ATCTGCAAGT TCACGTTTAG CTTGCTCAAT CTCTCTTTTG 3360 

GGTGTTTTCT GTGTCCTTTT CATAAAATGA TGCAGTAAAA CAAAACTACC ATCCATCCAA 3420 

GCAACAAATA AAATTCTATC TCTAAGTGGT CTCAGCTCCC AAATTTCAGC ATCTAAATGC 3480 

TTAATATATG GTTCGCCTGC GCGTGTTCCA TGTTGGCTTA ACAACTCAAT ATAATCATTA 3540 

ATTTTATTAA GCTTAATTCT GCTATCTTTC CCTTTTTTAC TGGTAAGCTC TCGCATATAA 3600 

TCAAAAACAG GCTCATTGCC GTTTTTATCC TTGTAAAAAT AGATATTATG CACTATTAAC 3660 

ACCTCTTCCT AATAACAATT ATAACCTAAA AGTTATTGTT TGTAAATACT TTTAAGTTAT 3720 

TAAAATAAAA AGCACCTAGT TTCCTAGATG CTAGCACAAT GACACGGATT CGCACCGTGG 3780 

CTACCTCTAT CAAGGTGTAC TCCTTCTATA CTATCCCTTG TGCTTTAGAA TATTATACCA 3840 

CACAATCAAC TAGATACCTA CCATCTCATG ATATACCCCC ATTTTGGGCA AGGGTACAAC 3900 

GCTAAAATAC AAATCAGAAT AGATATTAAA CCACTTATTT AACTTATCAT AAGCTGGTGA 3960 

TTGACTGATA AATAATATCC GCTGACAAGC TCCGATAACA TTCATGTGAT TGTACACATA 4020 

AACCTCTTTT ACAGCCTCTA AAATGTCAGC CTCACTTGTT TGTACCCTAA TATCTGTTAT 4080 

CTGCTTGATA GTTGCGTATT TTTGATAAGC TAGCATATCT TGATTTTTAG CAGCATCAAA 4140 

CATTTTACGC TCAAGGACAC TATACTTAGG TTGTTCTTTA TCTCGCATGA AATACCACTT 4200 

GAGCCATAAA ATCTTTTCTC GGTGTATTAC AGAAATACGC TCAATTTTCT TCTTTGTCAT 4260 

TGCTACCTCC TAAATCATCA ATTTAACAAT TCTAACCACT CACTTTTAGA AATAGTTGCA 4320 

TAGATCTTGT TCGATGTATG ATACAAAGGT TCTAAATCTT TTTCCACCCT AATATAGTTC 4380 

ATCTTATCCT CATGAGTAGG AAAGTATAGT ATTTCCGTTT CATCCTCGTT TAGGATACGA 4440 

TTGCACCAAT CATCAATAAT AACTGGCACT TCCCACTCAC GCCATTTTTT AAGGTTTTCT 4500 

AAAAGTTCAT TATCACTAAA TAGCTCGCCA TCTATTTGGA AAAATTCCCC TAAGTCATTG 4560 

TTTCCTTCAA CAATAATAAA CTCTGGCATA TTTCTATTAC TTAATAACTC CTTGAGTTCT 4620 

TGTAACTCTT TGATTTCCTT TAGATACTTC CTCAATTTCC AACCTCAATT CTTCAATCTG 4 680 

CCTTACTACT CCAAAAATTT CATGGGTCTT ATAAGATTGT TCAAGTATAG CCTTTGCTGC 4740 

TTGAGTTCTT ATAAACGGGT TGACCTTACT GTCCATCATA ATATCATTGA GTACAGAAAC 4800 

AGCGTTAGAT GATGCTAAAT AAAGCATTTG AGTTGTTTTA TCCATCATCT CATCTTGCTT 4860 

TATCCTCAAT GTCTTTTTAA CCGCTGCAAC TTTTAGATAC TTATGACCTG TTGCGCGTGA 4920 
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TACCCCTGCT TTTTGACATG CTTTGTCTAT CGTTGGCTCG GTAAGCATGG CATCTATGAA 4980 

TTTAATTTGC TTGGACGTAA GGTTATCATT TTCATTTCCT GCCATCTATT ACCTCCTCAT 5040 

TATC AAAATA - AAGGGTTGCC CCTTTATTTC CCTATGCTAG ATAATTCTGC AATTCTGCAT 5100 

CCATTGCCTC TGAATTGCCC TCAACAATCA TTTCATGCTG TACTAAATCA ATCTTATCTC 5160 

CGTTAATAAG TAAACCACCG TGGAAATAAT CAATTTTTCT ATCAAGGAAA TGTACTAGCT 5220 

TTTCAAGGCG TTGCTGTTGG CTGAATTGCT CCATGTCAAT TTCGATATAA GCAAGGGTAG 5280 

TATCATTATC CATAATATCT TCTAATTTTC TAAGAGCTAG AGGTTTATTT TTATATTTTT 5340 

CTAGGTATTC TCTCATTTCT GCCACTGTTA ATTTGATACT AGATAATAAA CTTAGTTCAG 5400 

CTGCATCATC TGCTGTAATA GGCTCTTCTT TTGATTCATG GTTTGCTAGT TCAGCATTTT 5460 

TCTCTTTTTC TAGTTGCTGA TACAATAGCT GAGCAGTATT TTGGGAATAG TTTTCGCCCT 5520 

CTTTTTTATA TTTTAAAAGT TCTTGCTCTG CATACACTTT CCCGATAATC ACTTCCTTAT 5580 

AAACTAATTG CCCATCTTGA GCTTTTAGCT TAATACTCCC ATGCTCTGGA ATTTCAATAT 5640 

ACTTAATTAT ACCATTTTTT GAGTATAAAA CAAAGCCTTT CTCCATCATT TTTAATAATT 5700 

TATCATCCTT GTTTTCAGTC ATGCTTTTCT CCTTTATTTC ATTTTATTAT AATCTGAATA 5760 

CCCCTAGTCT ATTTATTTCA CTAGGTTTTT AGGGTTCGTA TGCTAAAATA CTACCCTTTT 5820 

TGTGTACCTT ATGGCTGACT TTTCAAATTG GTTAGTT 5857 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10254 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

AAAATGATAG CAGGAGAGTT TTCCCGTCCA TCAGACCCAG AACTGAGAGC CTTAGCTCAG 60 

GCTTCTCGCC AAAAACAGGC CGCCTTTAAC AAGGAAGAGA ACCCCTTGAA GGGAGCCGAA 120 

ATCATCAAGA CTTGGTTTGC CTCAACCGGG AAAAATCTTT ACATCAACAC TCGCTTGATG 180 

GTGGACTACG GTGTCAACAT CCATCTAGGG GAAAATTTTT ATTCTAATTG GAACTTGACC 240 

ATGCTGGATA TCTGTCCCAT TCGTATCGGG GACAATGCTA TGATTGGTCC TAATTGTCAG 300 

TTTTTGACAC CCCTCCATCC ACTAGATCCA CAGGAACGCA ATTCAGGTAT CGAGTACGGA 360 

AAGCCTATCA CAATCGGAGA TAATTTCTGG ACTGGTGGTG GCGTCATTGT CCTTCCTGGA 420 

GTGACACTGG GAAATAATGT CGTTGCAGGA GCAGGGGCAG TAATTACCAA ATCTTTTGGC 480 
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GACAACGTTG 


TCCTAGCTGG 


CAATCCTGCG CGCGTGATTA 


AGGAAATACC 


TGTTAAATAG 


540 


AAGTAAAAAG 


GAACAGCTGG 


GGTTGTTTCT TTTTTGTAGG 


TTTCATCATT 


TTTTACCCAG 


600 


TTCACATTTA 


CCTACTCTAT 


CTCTTAGCAA 


GTCTGTTTCA 


TTAAGCAAGT 


TCAAAGCATC 


660 


TCGTAAGTGG 


GATGTTTTTC 


TCCTCAGTTC 


ATCAGCTTCC 


TCCTTGACAC 


TCGGTCAGAT 


720 


TTTGATACAA 


TAGTACAAAA 


TTAGAGGAGG 


CAGGCTATGA 


TTCAGAAACA 


TGCGATTCCT 


780 


ATTTTAGAGT 


TTGATGACAA 


TCCTCAGGCG 


GTTATCATGC 


CCAATCACGA 


GGGGCTGGAC 


840 


TTGCAGTTGC 


CAAAGAAGTG 


TGTTTATGCA 


TTTTTAGGTG 


AGGAGATTGA 


CCGCTATGCG 


900 


AGGGAAGTAG 


GGGCGAACTG 


TGTTGGCGAA 


TTTGTTTCTG 


CCACCAAGAC 


CTATCCAGTT 


960 


TATGTCGTGA ACTACAAGGA CGAGGAGGTC TGTCTGGCTC AGGCTCCTGT TGGCTCCGCT 


1020 


CCAGCAGCCC 


AGTTTATGGA 


TTGGTTGATT 


GGCTATGGTG 


TGGAGCAGAT 


TATCTCTACT 


1080 


GGGACCTGTG 


GTGTCCTAGC 


TGATATAGAG 


GAAAATGCCT 


TTCTAGTCCC 


TGTTCGCGCT 


1140 


CTGCGAGATG 


AAGGAGCCAG 


TTACCACTAT 


GTGGCACCTT 


GTCGTTATAT 


GGAAATGCAG 


1200 


CCAGAGGCTA 


TTGCTGCTAT 


TGAGGAAGTT TTGGAAGACA GAGGGATTCC TTATGAAGAA 


1260 


GTCATGACCT 


GGACGACAGA 


CGGTTTTTAC 


CGAGAAACGG 


CTGAAAAGGT 


GGCTTATCGT 


1320 


AAGGAAGAAG 


GCTGTGCTGT 


TGTGGAGATG 


GAGTGTTCTG 


CTCTTGCGGC 


AGTAGCTCAA 


1380 


TTGCGTGGGG 


TTCTCTGGGG 


TGAATTGTTG 


TTCACAGCAG 


ATTCTCTAGC 


GGACTTGGAC 


1440 


CAGTACGACA 


GTCGTGACTG 


GGGCTCGGAA 


GCTTTTAATA 


AGGCGCTAGA 


ACTGAGTTTA 


1500 


GCAAGTGTTC 


ACCACCTTTA 


GTTGTACTGG 


CAAAGGATTT 


GTTTTATCAT 


AAAATGTCTA 


1560 


GCTCATACTT 


TTCAAAAATA 


TGTTTAAACG 


AGGTCACCTT 


CCTCTTGTCC 


TAGGCATGTT 


1620 


GAGGTTGGGA 


AAAATCTTTA 


AAATCAGAAA 


AACGTATCAT 


ATCAGGTGAT 


GAAAACTTTG 


1680 


ACACTATGCG 


TTTTATGTCG 


ATAAGATTTA 


GAGTGAGATG 


AAATGATACT 


CTTCGAAAAT 


1740 


CTCTTCAAAC 


CAGGTCAGCT 


TCACCTTGCC 


GTAGGTATAT 


GTTACTGACT 


TCGTCAGTCT 


1800 


TATCCGGCAA 


CCTCAAAACG 


GTGTTTTGAG 


CTGACTTCGT 


CAGTTCTATT 


TGCAACCTCA 


1860 


AAACAGTGTT 


TTGAGCAACC 


TGTGACTAGC 


TTTCTAATCG 


ATGCCTTGGT 


TTTCATTGCC 


1920 


TATAATCAAA 


AAGAGAAATT 


TTCTCCTGAA 


AAGCATATAG 


AGTAGCTGGC 


GTTAAAAGCT 


1980 


CCTGTCTTGC 


TTTTTTGACC 


TATAGTCACA 


TCTATCAAGT 


ATTGTTCTTG 


CCTAAGCTAT 


2040 


CAATAAAAAG 


GTGGCATTTT 


TTAGGCTTGG 


TGTTAGTAGA 


TTTTGCCTTA 


TCCTATCTAA 


2100 


GTCATTTCGA 


ACTTTTTATG 


GTACAATGGA 


AACATGTTAT 


TCAAATTATC 


TAAGGAAAAA 


2160 


ATAGAGCTAG 


GCTTATCTCG 


TTTATCGCCA 


GCCCGTCGTA 


TTTTTTTGAG 


TTTTGCCTTG 


2220 
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GTCATTTTAC TAGGCTCTCT TCTTTTGAGC TTGCCCTTTG TCCAAGTTGA AAGCTCACGA 2280 

GCGACTTATT TTGATCATCT TTTCACTGCT GTCTCTGCAG TCTGTGTGAC GGGTCTCTCA 2340 

ACCCTTCCAG TAGCTCACAC CTATAATATC TGGGGTCAAA TAATCTGTTT GCTCTTGATT 2400 

CAGATCGGTG GTCTAGGGCT CATGACCTTT ATTGGGGTTT TCTATATCCA GAGCAAGCAA 2460 

AAGCTTAGTC TTCGTAGCCG TGCAACTATT CAGGATAGTT TTAGTTATGG AGAAACTCGA 2520 

TCTTTGAGAA AGTTTGTCTA TTCTATTTTT CTCACGACCT TTTTGGTTGA GAGCTTGGGA 2580 

GCTATTTTGC TTAGTTTTCG CCTTATTCCT CAACTTGGCT GGGGACGTGG TCTTTTTAGT 2640 

TCCATTTTTC TAGCGATCTC AGCCTTCTGT AATGCCGGTT TTGATAATTT AGGGAGCACC 2700 

AGTTTATTTG CTTTTCAGAC CGATTTACTG GTCAATCTGG TGATTGCAGG CTTGATTATT 2760 

ACAGGCGGCC TTGGTTTTAT GGTCTGGTTT GATTTGGCTG GTCATGTAGG AAGAAAGAAA 2820 

AAAGGACGTC TGCACTTTCA TACGAAGCTT GT ACT ATT AT TGACTATAGG TTTGTTGTTA 2880 

TTTGGAACAG CAACTACTCT CTTTCTTGAG TGGAACAATG CTGGAACGAT TGGCAATCTC 2940 

CCTGTTGCCG ATAAGGTTTT AGTTAGCTTT TTTCAAACAG TGACGATGCG AACAGCTGGC 3000 

TTTTCTACGA TAGATTATAC TCAGGCTCAT CCTGTGACTC TTTTGATTTA TATCTTACAG 3060 

ATGTTTCTAG GTGGGGCACC TGGAGGAACA GCTGGGGGAC TCAAGATTAC GACATTTTTT 3120 

GTCCTCTTGG TCTTTGCACG AAGTGAGCTT CTAGGCTTGC CTCATGCCAA TGTTGCGAGA 3180 

CGAACGATCG CGCCGCGAAC GGTTCAAAAA TCCTTTAGTG TCTTTATTAT CTTTTTGATG 3240 

AGCTTCTTGA TAGGATTGAT TCTGCTAGGG ATAACAGCCA AAGGCAATCC TCCCTTTATC 3300 

CACCTCGTAT TTGAAACCAT TTCAGCTCTT AGTACAGTTG GTGTAACGGC AAATCTGACT 3360 

CCTGACCTTG GGAAATTGGC TCTCAGTGTT ATCATGCCAC TTATGTTTAT GGGACGAATT 3420 

GGTCCCTTGA CCTTGTTTGT TAGCTTGGCA GATTACCATC CAGAAAAGAA AGATATGATT 3480 

CACTATATGA AAGCAGATAT TAGTATTGGT TAAGAAAGGA AAGAGCATGT CAGATCGTAC 3540 

GATTGGAATT TTGGGCTTGG GAATTTTTGG GAGCAGTGTC CTAGCTGCCC TAGCCAAGCA 3600 

GGATATGAAT ATTATCGCTA TTGATGACCA CGCAGAGCGC ATCAATCAGT TTGAGCCAGT 3 660 

TTTGGCGCGT. GGAGTGATTG GTGACATCAC AGATGAAGAA TTATTGAGAT CAGCAGGGAT 3720 

TGATACCTGC GATACCGTTG TAGTCGCGAC AGGTGAAAAT CTGGAGTCGA GTGTGCTTGC 3780 

GGTTATGCAC TGTAAGAGTT TGGGGGTACC GACTGTTATT GCTAAGGTCA AAAGTCAGAC 3840 

CGCTAAGAAA GTGCTAGAAA AGATTGGAGC TGACTCGGTT ATCTCGCCAG AGTATGAAAT 3900 

GGGGCAGTCT CTAGCACAGA CCATTCTTTT CCATAATAGT GTTGATGTCT TTCAGTTGGA 3960 

TAAAAATGTG TCTATCGTGG AGATGAAAAT TCCTCAGTCT TGGGCAGGTC AAAGTCTGAG 4020 
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TAAATTAGAC CTCCGTGGCA AATACAATCT GAATATTTTG GGTTTCCGAG AGCAGGAAAA 4080 

TTCCCCATTG GATGTTGAAT TTGGACCAGA TGACCTCTTG AAAGCAGATA CCTATATTTT 4140 

GGCAGTCATC AACAACCAGT ATTTGGATAC CCTAGTAGCA TTGAATTCGT AAAGAGGGAT 4200 

GACCCCTCTT TTTTGATGCC TAAGATGGCA AATAGAGACA GAAGCCCCTT GTCTTCTAGT 4260 

AAAAGTTCTT CAAAGGCTGG ACTTTATGGT AAAATAGAAA GAAGTGACAA GAGAGAGTAA 4320 

TACTCAATGA AAATCAAAGA TCAAACTAGG AAACTAGCTA CGGGCTGCTC AAAACACTGT 4380 

TTTGAGGTTG CAGATAGAAC TGACGAAGTC AGTAACATCT ATACGGCAAG GCGACGTTGA 4440 

CGCGGTTTGA AGAGATTTTC GAAGAGTATA AGAAAAAATC AGTCCCCTAA AGGAGTAGAT 4500 

TATGAAGTTA TTGTCTATCG CAATTTCTAG CTATAATGCA GCAGCCTATC TTCATTACTG 4560 

TGTGGAGTCG CTAGTGATTG GTGGTGAGCA AGTTGGGATT TTGATTATCA ATGACGGGTC 4620 

TCAGGATCAG ACTCAGGAAA TCGCTGAGTG TTTAGCTAGC AAGTATCCTA ATATCGTTAG 4680 

AGCCATCTAT CAGGAAAATA AATGCCATGG . CGGTGCGGTC AATCGTGGCT TGGTAGAGGC 4740 

TTCTGGGCGC TATTTTAAAG TAGTTGACAG TGATGACTGG GTGGATCCTC GTGCCTACTT 4800 

GAAAATTCTT GAAACCTTGC AGGAACTTGA GAGCAAAGGT CAAGAGGTGG ATGTCTTTGT 4860 

GACCAATTTT GTCTATGAAA AGGAAGGGCA GTCTCGTAAG AAGAGTATGA GTTACGATTC 4920 

AGTCTTGCCT GTTCGGCAGA TTTTTGGCTG GGACCAGGTC GGAAATTTCT CCAAAGGCCA 4980 

GTATACCATG ATGCACTCGC TGATTTATCG GACAGATTTG TTGCGTGCTA GCCAGTTCTA 5040 

ACTGCCTGAA CATACTTTTT ATGTCGATAA TCTCTTTGTC TTTACGCCCC TTCAGCAGGT 5100 

CAAGACCATG TACTATCTGC CTGTCGATTT CTATCGTTAT TTGATTGGGC GTGAGGACCA 5160 

GTCTGTCAAT GAGCAAGTGA TGATTAAGTG CATTGACCAG CAACTCAAGG TCAATCGACT 5220 

CTTGATAGAC CAACTTGATT TGTCCCAAGT GAGTCATCCC AAAATGCGAG AATATCTGCT 5280 

GAATCATATT GAACTCACGA CGGTGATTTC CAGTACCCTG CTCAACCGAT CTGGAACAGC 5340 

GGAGCATCTG GCAAAAAAAC GCCAATTGTG GACCTATATT CAGCAGAAAA ATCCAGAAGT 5400 

CTTTCAGGCT ATTCGTAAGA CCATGTTGAG CCGTTTGACC AAACATTCTG TCTTGCCAGA 54 60 

TCGCAAACTG TCCAATGTCG TCTATCAAAT CACCAAATCT GTTTATGGAT TTAATTAATA 5520 

TAAGTGTTTT ATAAGAGGGA TTTAAGAAAA ATTTTAACTT TTTCTTAGTC CTTTTTAATT 5580 

TCAGGAGATT ATACTAGAGT CATCAAATAA AGAAAGACTC TAAGGAGAAT CCTATGAAAT 5640 

TCAATCCAAA TCAAAGATAT ACTCGTTGGT CTATTCGCCG TCTCAGTGTC GGTGTTGCCT 5700 

CAGTTGTTGT GGCTAGTGGC TTCTTTGTCC TAGTTGGTCA GCCAAGTTCT GTACGTGCCG 5760 
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ATGGGCTCAA TCCAACCCCA GGTCAAGTCT TACCTGAAGA GACATCGGGA ACGAAAGAGG 5820 

GTGACTTATC AGAAAAACCA GGAGACACCG TTCTCACTCA AGCGAAACCT GAGGGCGTTA 5880 

CTGGAAATAC GAATTCACTT CCGACACCTA CAGAAAGAAC TGAAGTGAGC GAGGAAACAA 5940 

GCCCTTCTAG TCTGGATACA CTTTTTGAAA AAGATGAAGA AGCTCAAAAA AATCCAGAGC 6000 

TAACAGATGT CTTAAAAGAA ACTGTAGATA CAGCTGATGT GGATGGGACA CAAGCAAGTC 6060 

CAGCAGAAAC TACTCCTGAA CAAGTAAAAG GTGGAGTGAA AGAAAATACA AAAGACAGCA 6120 

TCGATGTTCC TGCTGCTTAT CTTGAAAAAG CTGAAGGGAA AGGTCCTTTC ACTGCCGGTG 6180 

TAAACCAAGT AATTCCTTAT GAACTATTCG CTGGTGATGG TATGTTAACT CGTCTATTAC 6240 

TAAAAGCTTC GGATAATGCT CCTTGGTCTG ACAATGGTAC TGCTAAAAAT CCTGCTTTAC 6300 

CTCCTCTTGA AGGATTAACA AAAGGGAAAT ACTTCTATGA AGTAGAGTTA AATGGCAATA 6360 

CTGTTGGTAA ACAAGGTCAA GCTTTAATTG ATCAACTTCG CGCTAATGGT ACTCAAACTT 6420 

ATAAAGCTAC TGTTAAAGTT TACGGAAATA AAGACGGTAA AGCTGACTTG ACTAATCTAG 6480 

TTGCTACTAA AAATGTAGAC ATCAACATCA ATGGATTAGT TGCTAAAGAA ACAGTTCAAA 6540 

AAGCCGTTGC AGACAACGTT AAAGACAGTA TCGATGTTCC AGCAGCCTAC CTAGAAAAAG 6600 

CCAAGGGTGA AGGTCCATTC ACAGCAGGTG TCAACCATGT GATTCCATAC GAACTCTTCG 6660 

CAGGTGATGG CATGTTGACT CGTCTCTTGC TCAAGGCATC TGACAAGGCA CCATGGTCAG 6720 

ATAACGGCGA CGCTAAAAAC CCAGCCCTAT CTCCACTAGG CGAAAACGTG AAGACCAAAG 6780 

GTCAATACTT CTATCAAGTA GCCTTGGACG GAAATGTAGC TGGCAAAGAA AAACAAGCGC 6840 

TCATTGACCA GTTCCGAGCA AAyGGTACTC AAACTTACAG CGCTACAGTC AATGTCTATG 6900 

GTAACAAAGA CGGTAAACCA GACTTGGACA ACATCGTAGC AACTAAAAAA GTCACTATTA 6960 

ACATAAACGG TTTAATTTCT AAAGAAACAG TTCAAAAAGC CGTTGCAGAC AACGTTAAAG 7020 

ACAGTATCGA TGTTCCAGCA GCCTACCTAG AAAAAGCCAA GGGTGAAGGT CCATTCACAG 7080 

CAGGTGTCAA CCATGTGATT CCATACGAAC TCTTCGCAGG TGATGGTATG TTGACTCGTC 7140 

TCTTGCTCAA GGCATCTGAC AAGGCACCAT GGTCAGATAA CGGTGACGCT AAAAACCCAG 7200 

CCCTATCTCC ACTAGGTGAA AACGTGAAGA CCAAAGGTCA ATACTTCTAT CAATTAGCCT 7260 

TGGACGGAAA TGTAGCTGGC AAAGAAAAAC AAGCGCTCAT TGACCAGTTC CGAGCAAACG 7320 

GTACTCAAAC TTACAGCGCT ACAGTCAATG TCTATGGTAA CAAAGACGGT AAACCAGACT 7380 

TGGACAACAT CGTAGCAACT AAAAAAGTCA CTATTAACAT AAACGGTTTA ATTTCTAAAG 7440 

AAACAGTTCA AAAAGCCGTT GCAGACAACG TTAAGGACAG TATCGATGTT CCAGCAGCCT 7500 

ACCTAGAAAA GGCCAAGGGT GAAGGTCCAT TCACAGCAGG TGTCAACCAT GTGATTCCAT 7560 
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ACGAACTCTT CGCAGGTGAT GGCATGTTGA CTCGTCTCTT GCTCAAGGCA TCTGACAAGG 7620 

CACCATGGTC AGATAACGGC GACGCTAAAA ACCCAGCTCT ATCTCCACTA GGTGAAAACG 7680 

TGAAGACCAA AGGTCAATAC TTCTATCAAG TAGCCTTGGA CGGAAATGTA GCTGGCAAAG 7740 

AAAAACAAGC GCTCATTGAC CAGTTCCGAG CAAACGGTAC TCAAACTTAC AGCGCTACAG 7800 

TCAATGTCTA TGGTAACAAA GACGGTAAAC CAGACTTGGA CAACATCGTA GCAACTAAAA 7860 

AAGTCACTAT TAAGATAAAT GTTAAAGAAA CATCAGACAC AGCAAATGGT TGATTATCAC 7920 

CTTCTAACTC TGGTTCTGGC GTGACTCCGA TGAATCACAA TCATGCTACA GGTACTACAG 7980 

ATAGCATGCC TGCTGACACC ATGACAAGTT CTACCAACAC GATGGCAGGT GAAAACATGG 8040 

CTGCTTCTGC TAACAAGATG TCTGATACGA TGATGTCAGA GGATAAAGCT ATGCTACCAA 8100 

ATACTGGTGA GACTCAAACA TCAATGGCAA GTATTGGTTT CCTTGGGCTT GCGCTTGCAG 8160 

GTTTACTCGG TGGTCTAGGT TTGAAAAACA AAAAAGAAGA AAACTAATCA GCTAAGGAAA 8220 

TAAATGATGG ATAGTGGGCT GACTAAGATT AGTTTAACAA CTCAATCAGC AATCAGGACT 8280 

TTCTTTCAAT AGCAGATTAA AATCATCGTA AAACAATAAA AATAGTGTTA TACTTAAAGC 8340 

AGTATAGCAC TGTTTTTATC AAAGGAGAGA CAGATGGGAA AGACAATTTT ACTCGTTGAC 8400 

GACGAGGTAG AAATCACAGA TATTCATCAG AGATACTTAA TTCAGGCAGG TTATCAGGTC 8460 

TTGGTAGCCC ATGATGGACT GGAAGCGCTA GAGCTGTTCA AGAAAAAACC GATTGATTTG 8520 

ATTATCACAG ATGTCATGAT GCCTCGGATG GATGGTTATG ATTTAATCAG TGAGGTTCAA 8580 

TACTTATCAC CAGAGCAGCC TTTCCTATTT ATTACTGCTA AGACCAGTGA ACAGGACAAG 8640 

ATTTACGGCC TGAGCTTGGG AGCAGATGAT TTTATTGCTA AGCCTTTTAG CCCACGTGAG 8700 

CTGGTTTTGC GTGTCCACAA TATTTTGCGC CGCCTTCATC GTGGGGGCGA AACAGAGCTG 8760 

ATTTCCCTTG GCAATCTAAA AATGAATCAT AGTAGTCATG AAGTTCAAAT AGGAGAAGAA 8820 

ATGCTGGATT TAACTGTTAA ATCATTTGAA TTGCTGTGGA TTTTAGCTAG TAATCCAGAG 8880 

CGAGTTTTCT CCAAGACAGA CCTCTATGAA AAGATCTGGA AAGAAGACTA CGTGGATGAC 8940 

ACCAATACCT TGAATGTGCA TATCCATGCT CTTCGACAGG AGCTGGCAAA ATATAGTAGT 9000 

GACCAAACTC CCACTATTAA GACAGTTTGG GGGTTGGGAT ATAAGATAGA GAAACCGAGA 9060 

GGACAAACAT GAAACTAAAA AGTTATATTT TGGTTGGATA TATTATTTCA ACCCTCTTAA 9120 

CCATTTTGGT TGTTTTTTGG GCTGTTCAAA AAATGCTGAT TGCGAAAGGC GAGATTTACT 9180 

TTTTGCTTGG GATGACCATC GTTGCCAGCC TTGTCGGTGC TGGGATTAGT CTCTTTCTCC 9240 

TATTGCCAGT CTTTACGTCG TTGGGCAAAC TCAAGGAGCA TGCCAAGCGG GTAGCGGCCA 9300 
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AGGATTTTCC 


TTCAAATTTG 


GAGGTTCAAG GTCCTGTAGA 


ATTTCAGCAA 


TTAGGGCAAA 


9360 


CTTTTAATGA 


GATGTCCCAT 


GATTTGCAGG TAAGCTTTGA 


TTCCTTGGAA 


GAAAGCGAAC 


9420 


GAGAAAAGGG 


CTTGATGATT 


GCCCAGTTGT CGCATGATAT 


TAAGACTCCT 


ATCACTTCGA 


9480 


TCCAAGCGAC 


GGTAGAAGGG 


ATTTTGGATG GGATTATCAA 


GGAGTCGGAG 


CAAGCTCATT 


9540 


ATCTAGCAAC 


CATTGGACGC 


CAGACGGAGA GGCTCAATAA 


ACTGGTTGAG 


GAGTTGAATT 


9600 


TTTTGACCCT 


AAACACAGCT 


AGAAATCAGG TGGAAACTAC 


CAGTAAAGAC 


AGTATTTTTC 


9660 


TGGACAAGCT 


CTTAATTGAG 


TGCATGAGTG AATTTCAGTT 


TTTGATTGAG CAGGAGAGAA 


9720 




V- 1 iuLAou i A 


AIVCCAtiAtsT. t rGCCCGGAT 


TGAGGGAGAT 


TATGCTAAGC 


9780 


TTTCTCGTAT 


CTTGGTGAAT 


CTGGTCGATA ACGCTTTTAA 


ATATTCTGCT 


CCAGGAACCA 


9840 


AGCTGGAAGT 


GGTGGCTAAG 


CTGGAGAAGG ACCAGCTTTC 


AATCAGTGTG 


ACCGATGAAG 


9900 


GGCAGGGTAT 


TGCCCCAGAG 


GATTTGGAAA ATATTTTCAA 


ACGCCTTTAT CGTGTCGAAA 


9960 


CTTCGCGTAA 


CATGAAGACA 


GGTGGTCATG GATTAGGACT 


TGCGATTGCG CGTGAATTGG . 


10020 


CCCATCAATT 


GGGTGGGGAA 


ATCACAGTCA GCAGCCAGTA 


CGGTCTAGGA 


AGTACCTTTA 


10080 


CCCTCGTTCT 


CAACCTCTCT 


GGTAGTGAAA ATAAAGCCTA 


AAACCCCTTT 


ACAAATCCAG 


10140 


CTATTCATGG 


TAGAATAGAT 


TTTGTGTGAA ATATCAGCAG 


GAAAGCATGA 


AGCTCGTCAA 


10200 


CAGGTGTCTT 


ATGACAAGTA 


ACCTTGGCTG TTTAGGCGAA 


GGGCATCTGC 


ACGG 


10254 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9769 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY : linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

CCGGCGACTA TCGATAACAC TTGACTTGGT AGCCCCACAT TTTGGACAAC GCATCCTTTC 60 

CCTCCTTATC GTTTTCTTTT CATTATACCA TTTTTTAAGC GATTCCCAAA ACAATTCTTC . 120 

TTTTTGCTTG ACAAGTTTTT TGTTTTGTTG TATTATTTAA TTAAGACAAC AAGGTAAAAG 180 

AAAGGAGACT AAGATGTCCT GGACATTTGA CAACAAAAAA CCCATCTATT TACAGATTAT 240 

GGAGAAAATC AAGCTTCAGA TTGTTTCCCA TACACTGGAA CCCAATCAAC AACTTCCAAC 300 

CGTGAGGAGC TAGCTAGCGA GGCTGGTGTC AATCCCAATA CCATCCAAAG AGCCTTATCA 3 60 

GACCTTGAAC GAGAAGGATT TGTCTACAGC AAGCGAACAA CTGGACGATT TGTGACTAAG 420 

GATAAGGAGC TAATCGCCCA GTCACGCAAA CAATTATCAG AAGAAGAATT GGAACACTTC 480 
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GTTTCCTCCA TGACCCATTT TGGCTATGAA AAAGAAGAAC TACCAGGCGT AGTCAGTGAT 540 

TATATTAAAG GAGTTTAAGC CTATGTCATT ACTAGTATTT GAAAATGTAT CCAAATCATA 600 

TGGAGCAACA CCAGCCCTTG AAAATGTTTC TCTTGACATT CCAGCTGGAA AAATTGTCGG 660 

CCTTCTTGGG CCAAACGGCT CAGGAAAAAC AACCCTGATT AAACTAATTA ATGGCCTCTT 720 

ACAACCAGAT CAAGGACGTG TCCTCATCAA CGACATGGAC CCAAGCCCAG CAACCAAGGC 780 

CGTTGTAGCT TATTTGCCTG ATACGACCTA TCTCAATGAG CAAATGAAGG TCAAAGAAGC 840 

CCTAACCTAC TTCAAGACCT TCTATAAAGA TTGTCAGATC TTGAACGCGC CCATCATCTA 900 

CTTGCAGACC TGGGCATTGA TGAAAATAGT CGTCTCAAGA AACTATCAAA AGGAAACAAA 960 

GAAAAGGTTC AACTGATTTT GGTTATGAGC CGTGATGCTC GTCTCTATGT TTTGGACGAA 1020 

CCCATTGGTG GGGTGGATCC AGCAGCCCGT GCTTATATCC TCAATACCAT TATCAACAAC 1080 

TACTCACCAA CTTCTACCGT TTTGATTTCT ACCCACTTGA TTTCTGATAT CGAGCCAATC 1140 

TTGGATGAAA TTGTCTTCCT AAAAGACGGA AAAGTCGTCC GTCAAGGAAA TGTAGATGAT 1200 

ATTCGCTACG AGTCAGGTGA ATCCATTGAC CAACTCTTCC GTCAGaATTT AAGGCCTAAG 1260 

CAAAGGAGAT TATTTATGTT TTGGAATTTA GTTCGCTACG AATTTAAAAA TGTTAACAAG 1320 

TGGTATTTAG CCCTCTACGC AGCCGTGCTA GTCCTTTCTG CCCTCATCGG AATACAGACA 1380 

CAAGGCTTTA AAAATCTACC TTACCAAGAA AGTCAGGCTA CTATGCTACT TTTTCTAGCT 1440 

ACAGTCTTTG GTGGCTTGAT GCTTACACTT GGGATTTCAA CCATTTTCTT GATTATTAAA 1500 

CGCTTCAAAG GTAGTGTCTA CGACCGACAA GGCTATCTGA CTTTGACCTT GCCAGTTTCT 1560 

GAACACCATA TCATCACAGC CAAACTAATC GGTGCCTTTA TCTGGTCATT GATTAGCACC 1620 

GCTGTATTGG CTCTAAGTGC TGTTATTATT CTGGCTTTAA CAGCTCCAGA ATGGATTCCT 1680 

CTTTCTTATG TGATTACATT TGTAGAAACA CATCTCCCTC AGATCTTTCT TACAGGTATA 1740 

TCCTTCCTAC TAAATACTAT TTCAGGAATC CTCTGCATCT ACCTGGCTAT TTCCATTGGA 1800 

CAGCTTTTCA ATGAATACCG TACAGCACTC GCTGTTGCAG TCTACATTGG TATCCAAATC 1860 

GTCATTGGAT TTATTGAACT TTTCTTCAAT CTTAGTTCTA ATTTCTATGT CAATTCACTG 1920 

GTAGGACTCA ATGACCATTT CTATATGGGA GCAGGTATAG CCATTGTTGA AGAACTCATA 1980 

TTCATAGCTA TCTTTTATCT CGGAACCTAC TACATCTTGA GAAATAAGGT TAATTTGCTT 2040 

TAAATAATTT TTACCTAGAT ATGTAACATA CTCATAGAAC AAAAGAGACC AGGCAAAAAG 2100 

TCTTTAAAAT TAGAAAACGC ATAGTATCAG GTGTTGAATA TGTACTGCcC CCCAAAAGTT 2160 

AGATTTTTTC TGTCTAACTT TTGGGGGCAG TTCATAAGAA CCTTGGTAAT ATGCGTTTTT 2220 
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TGTGAGCTGA 


CTTATTTCCT 


TTCACTATAT 


CGCAAAATGA 


AATAAGAACG 


GAACGATGGG 


2280 


ATTTTGGAAT 


TCAAATCAAT 


TTATAAGAAT 


GTTTTAGAAG 


TAATATTATC 


CTATTCCAGA 


2340 


TTCAGTTCAC 


TATACAATTG 


AGTTTTCAAG 


CAACCTGTTT 


ACATAATGTG 


TACATAATTA 


2400 


GGTTCGTGAT 


TCCACCCTTT 


TCACCTTTAA 


AAACCTCGCT 


TTCGCAAGGC 


TCTTCTATTT 


2460 


ATAAGATAAG 


GCACGTTTAA 


AGGTTTTCCA 


AATCCCTAAA 


TCATCCGTTT 


GAAGAAPGAG 


2520 


ACTAGCATAC 


ATGCGTCCGA 


TAAATCCTGT 


TGCTACCACC 


GCAAAAATCA 


PTGTAATAGP 


2580 


AAGTGAAATC 


CATGCTTCTG 


CTCCCCCCGC 


ATAGTCATTA 


ATCGTTCGAA 


ACGGCATAAA 


2640 


GAAGGTCGAA 


ATAAAGGGAA 


TATAAGAACC 


AATCTTCAAG 


AGGAGATTGT 

x^OvAwn X X VJ X 


PA PP APPTP.P 


2700 


ACCTAGAGCT 


GTCACTCCAA 


AAAAACCACC 


CATAATCAAA 


n X X \^txt\n\3 




£. 1 DU 


TTTCCCTGAG 


TCCTCAGGAC 


GAGAAACCAT 


AGATPtTAGG 

nun X V*V* X /A\JVJ 


& a czn. p*pp. rv* a 


AVa AL. 1 ALu 1 A 


2820 


CATGAAAAGA 


CTGATCAAAA 


TAAAGAGCAA 


VJVJ 1 1 i \_ /\ vl X 


Vj/\Vj/\ 1 AlA, A I 




2880 


ATCCAAAATA 


CCAGACTGAG 


CCAAGAATGG 


P.AAATPTTTA 


a afiififi a a a 


L wWjC V, AVj 


2940 


ACCACCTACA 


ACATAGATCC 


CAATA1\5CGT 


T AAA ATP APT 


nuhhftV. Aunv 




3000 


CGCATAGAAA 


TAGTGACTTG 


\*%*V*X AnluLI 


AGAAAAAAPf: 




TTTTGGTGCC 


3060 


TTTTTCACTG 


GCAACTTCCT 


GAGCTGTTAC 


ACCC.GCATAG 




TPATATAaAP 




AAAGAATCCT 


AAGGCACCTG 


CTGCAATTGT 


TTGAATAAAP. 






J10U 


ATCAATCTTT 


TCTGTGAATT 


GAATTGTCTG 


CGCTAAGCGT 


TTTTPPTPPT 


PTTRAOAPAA 




GGAAGCAGTT 


GAACGATTAA 


GCTGATTTTG 


CAGTTCATTG 




TAAPPTPAAA 


linn 


TTTAATTCCA 


TTTTCAAGCG 


ATGTTTCGCC 


ATGATAAACT 


GCCTTTAGAA 


v«v X J"\ X ^ X X 


3360 


TTGATCAATG 


GTCAAATAAC 


CTTTTAATTT 


TTCTTCTTTA 


ATTGC TTCTT 


TGGCACTTGC 


3420 


TTCGTCTTTA 


TAGTCGAAGT 


TAACACCATT 


TACATTCTTC 


AGTCCTTCTG 


CTACAGATGG 


3480 


CACTGTTGTC 


ACTACTGCCA 


CTTTATTATT 


TTTAGCCATA 


GAAGAACCTT 


GGAGATG CCC 


3540 


AATTCCTACA 


GAGATTCCTA 


AAAAGAGGAA 


CGGCGAAATC 


ACCATAAAGA 


AGAAACTCCA 


3600 


TGACTCGACA 


TGTCGAAGAT 


AGGTTTCCTT 


GATTACAACC 


CACATATTTC 


TCATACTTCC 


3660 


ACTCCTGATT 


CTAGTTTAAA 


GATTTCATCG 


ATAGTTGGCG 


CTTGTTGGTC 


AAATGTTGCG 


3720 


ATATATTGAC 


CTTGAGTCAA 


GATTGAGAAG 


AGTTCCCTTC 


CAGCGCTCTC 


ATCCTCCAAA 


3780 


ATCAATTTCC 


AACTGCCTTG 


TTTGGTCAAG 


CTCACCTGTT 


TGACATGAGG 


AAGATTTTCC 


3840 


AATTCTTCCT 


TGCTTCGTTC 


ACTTGAAACA 


AAGAGACGCG 


TTTTCCCGTA 


TTGATTGCGG 


3900 


ACATCCTGAA 


CTGGTCCGTG 


CAAGACCACA 


CGGCCATCTC 


GGATCATCAG 


AATATCGTCA 


3960 


CAAAGTTCCT 


CAACATTGGT 


CATGACATGG 


TCAGAAAAGA 


TAATGGTTGT 


CCGCGCTCTT 


4020 
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TTTCCTGAAA AATGACTTGT TTGAGCAATT CTGTATTAAC TGGGTCCAAT CCACTAAAAG 4080 

GCTCATCCAA GATAATCAGG TCTGGTTCAT GAATCAGAGT AATAATGAGC TGAATCTTCT 4140 

GCTGATTTCC .TTTTGACAGA CTCTTGATTT TATCTGTCAG CTTTCCTTTC ACTTCCAACC 4200 

TCTTCATCCA TTGAGGGAGT TTTTCTTTGA CTTCTTTGGC ATCCATGCCT TTTAGAGTCG 4260 

CCAAGTAGCG AACTTGTTCA AGAACTGTCA ATTTAGGCAT GAGATGCGTT CTTCAGGCAG 4320 

ATAACCAATC CGAGCATAGG TCTCCTGACG AATATCCTGA CCATCCAGAC CGATTTCTCC 4380 

CTGATATTCT AGGAATTTCA AAATACTATG GAAAATCGTT GTTTTTCCAG CACCATTTTT 4440 

TCCGACTAGT CCCAAAATAC GACCTGGTCG CGCTTGAAAG TCAATACCAA ACAAAACTTG 4500 

CTTGGATCCA AAACTTTTCT CTAGACTTCT TACTTCTAGC ATCTTTCACC TCCGAAATTT 4560 

CTTGCACTCA TTATACTCCT TTTTGATAGC CTTTACAATG TTTTTTGTCC ATTTTTAGAA 4620 

GACTATTGCT GTGTAAAATA TGGCCTGGAG CACTTTTATA CTCAATGAAA ATCAAAGAGC 4680 

AAACTAGGAA GCTAGCCGTA GACTGCTCAA AGTACAGCTT TGAGGTTGCA GATAAAACTG 4740 

ACGAAGTCgA CTCAAAACAC TGTTTTGAGG TTGTGGATAG AACTGACGAA kCrTAaCTAT 4800 

ATCTACGGCA AGGCGAAcTG ACGTGGTTTG AAGAGATTTT CGAAGAGTAT TAGTGATAAA 4860 

TCCATTATAC AGCAGCAAAC TTAATTTATA CCTTCCGCTC CTCAACTGTC TATTTTTAAT 4920 

CCTGAATTGT TATTTGAGTA ACTCCTTTTT CCTCGTAAAG TTTTCTTCCT CTAAAACTTC 4980 

TGGAAAAAGG CTAATAGTTT CAGACAACAT TTTTATAAGA AACAAGTTCA TCTGTCATTT 5040 

CAAGAAGGAG TAATCCTTTA TCTACTAATG GACGGAACAG AATTCAACCG CTTGTCCGAT 5100 

ATGTTTTCTA AGGATTATAT AGTAAAATGA AATAAGAACA GGACAAATTG ATCAGGACAG 5160 

TCAAATTGAT TTCTAACAAT GTTTTAGAAG TAGATGTATA CTATTCTAGT TTCAATCTGC 5220 

TATATCTATT ATGCACACCC CTATAGGATC TAATGAAAAT CACAACAGGC TCATTCATAG 5280 

ATGGTTACCT AAGCCTAAGG GAACTAAGAA AACGACTACC AAGGAAGTCG CATTCATCGA 5340 

AAAGTAGATT AACAACTATC CTAAAAAATG CTTGAACTAC AAGTCCCCCA GAGAAGACTT 5400 

CTGGATGACT AACTTGAACT TGAAATTTAG CAATAATTAA TTCACTATCT AACTATATTT 5460 

AGTAATTATT TCAGAACTGA TTAATATTAA AATTAACTAA CAATTCAAAG GATTCATACT 5520 

AGCCATAAAT TACGTCCATC AGAGAGAGAC TCTTACTACT TTTAGATTTT AGTCTTTCTA 5580 

GCTTCAGAAT ACATCTAAAC TTTAGGGAAA ATGACTATTC GAAAGCGCGA ATGCCTCAAA 5640 

ATTATCTCAG ATAAGCTATT CGAAACTTAG AATGCTTTTA AATTTATGGA ATTGCGATTA 5700 

TTCGAAACCT AGAATGCATA TAACCTTTAG TTGACAGACC TATTCTAAGT CTCGAAGGGC 5760 
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TATTTACTTT 
TATAGTAGAA 
TGTTTCTCCC 
TTTGATTTTC 
TGCCTTTATC 
ACAGACTGTC 
CGGGCACATC 
CTTTTATCTA 
CTACGACAAG 
TATGATGATT 
GATGAGCTTC 
TTATACCTAC 
AAATCCGTTT 
AATATACTTT 
CCTTTCCGCC 
GCCTAGTGTC 
CTTAAGGAAA 
GATTTTGTGC 
CCATTTCCTG 
CCATTAAGAA 
ACCCCTCCAC 
TGAGCGATAT 
CGTACACCTG 
TGGAAAGGAC 
TAATGACCCA 
GCACCGATAC 
TTACGGGCAA 
ACGTTTAGGG 
GTCGTCGTGA 
CCAACTGCAA 



CTATTCCTTA TCAAAAAAGA 
ATATACTATC TATGAGGAGT 
CTTCTGCAGC GAGTTATCAA 
TGTATTTGGG CTTATCAGGC 
CAGCAGGCAG GCATCTGGGG 
GTCCCTATCA TTCCAGGGGC 
ATCGGGACTA TCTACAACTA 
GTGCGCCTAT ACGGAGCTGC 
TACATCGACT GGCTAGATAA 
TGGCCCATTA GCCCAGCTGA 
AAGCGCTACA TGACCATCAT 
GGTCTGACCT ATATTATTGA 
GGTTTCCCAA GTGGATTTTT 
GGTATGGAAA TCATGCATAT 
GTGATAGAAA CACCTGAAAT 
TCAAAGTTTA GGTATGGAAT 
GGCTCAAAAA TATTGTTTTC 
TTTATTTTGA AACTTCTTTT 
CGACTGCTGG CGTCACGATA 
GAGATGTAAA TTTCTCACGG 
CAAAGACAAT CACGTCTGGG 
AGTAGGCTTG AACATCCCAA 
TACGAGCTTC CAAACTTGGA 
AAACACCCTT AAACTCTTTT 
TTTCAGGGTG ACCCACACCA 
CTGTACCGAT TGTGTAGTAA 
CCATTTCACC GTAAGCAGAG 
CGCGACGAAG GGCACCAAGC 
TAAAGCCATA AGTTTTTGAG 
GACCAGCAAG GTTATCGAAT 



322 

CTCATTCCCC CTTTCTCCTC 
TTACATGTCA CAGGATAAAC 
TATCTCATCG ATTGTCGGTG 
TGGGATTTTA CAATCCAAGG 
TCCACCTCTC TTTATCTTTT 
CTTGACCTCG GTGGCTGGGG 
TATCGGCATC GTGATTGGCT 
CTTTGTCCAG TCTGTCGTCA 
GGGCAATCGT TTTGACCGCT 
CTTTCTCTGT ATGCTGGCTG 
CATTCTGACC AAACCCTTTA 
CTTTTTCTGG CAAATGCTTT 
AAAGCGTAGA TTAACTATAG 
TTTTCGATAG TGAGGCGAGG 
CTAATGGTTT CAGGTATTCG 
TTTGAAGAAA GTCGCTACCG 
AACCACAAAA TCCGTTTGGT 
GCAAGAACAA AGTTCCCAAG 
TAGTCACGCA CATCTGGTAC 
ACACGGTCCA GCATATGTTG 
CGGAAAGTCA CTGTCGCATT 
ACAGGGTTGT TGAGTTCAAT 
CCAGCTGCAT AACCTTCTAG 
TCAATATCCA TTGGGTGTCT 
CCGATAAACT CACCACGTTG 
ACCAAGTTTT CGATACGACC 
CTGTTTACGT CTGTTGTGAA 
AAGTCTACAT TTGCCCAGTT 
TTTTTGTCAA TATCAATCGG 
TTTGAGAAGA ACTCAATGGT 



CAAAATATGG 5820 

AAATG AAAGC 5880 

GGGTTGGGAG 5940 

AAACCCTCTC 6000 

TACAGATTTT 6060 

TCTTTATCTA 6120 

GTGCCATTAT 6180 

GCAAGCGCAC 6240 

TCTTTATTTT 6300 

CCCTGACCAA 6360 

CCCTCGTGGT 6420 

GACACGTAAA 6480 

CTTGATACTA 6540 

ACTTACCTAG 6600 

GAAACTTTGA 6660 

TCCGTAATCA 672 0 

TTCCCAAGCG 6780 

TGTGGCAGAA 6840 

TGGTAGGTAA 6900 

TTGAGCCATG 6960 

AACCGCAGCT 7020 

AGTTTCCCCA 7080 

ACATCCCTTA 7140 

AGCAACATAA 7200 

GATGACGCCT 7260 

ACCAGCATTG 7320 

GTACATTGGC 7380 

TGGTTTTGGA 7440 

CCCAAATGAA 7500 

TTTATCGATT 7560 
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GTTTCGATTG GAGTTGTTGT TGGAAATTGT GTTTTTTCTA CAACGTTAAA GTTTTCATCA 7 620 

CCGACAGCAC AGACAAACTT TGTACCGCCC GCTTCCAAGC TTCCATATAA TTTTGTCATG 7680 

ATAAACCTCT TGTTTTTATT TTCTTTATTA TAGCATACTT CGAAAGTCTA AATGTCTCTA 7740 

TTTTTTAGAT TTTCCTCTGT AAATCTTACT ATCTAATAAA AACGAACAAA CATGTCATTT 7800 

GTTCGTTTTC ACATTAGAGA GGATTGATTA GATTTTCACT TCGATCACAG CATCCCCCTT 7860 

AGCAACTGAA CCTGTTGCGA CTGGAGCTAC TGAAGCGTAG TCACCTGTAT TTGTAACGAT 7920 

AACCATTGTT GTATCATCAA GTCCAGCTGC AGCGATTTTG TTTGAGTCAA ATGTTCCAAG 7980 

AACATCGCCA GCTTTCACCT TATTACCTTG AGC AACTTTT GTTTCAAAAC CGTCACCGTT 8040 

CATAGATACA GTATCAATAC CAACATGAAT CAAAACTTCA GCACCATTTC TTGTTTTCAA 8100 

i 

ACCAAAAGCG TGCCCTGTTG GAAAGGCAAT TGAAACTTCA GCATCAGCTG GTGCATAGAC 8160 

CACGCCTTGG CTTGGTTTCA CAACGATACC TTGTCCCATA GCTCCACTTG AGAAGACTGG 8220 

GTCATTGACA TCAGCAAGAG CGACAACATC ACCGACGATA GGAGTTACAA GTGTTTCATT 8280 

TTGAAGAGCT GCTGGCGCAA CTTCTTCTTT TTCTTCAGCC ACTTCAGCTC GTTTTGCAGC 8340 

TGCAGTTGCG TCTACTTCAT CTTCGTAACC AAACATGTAA GTAAGAGCAA AACCAAGGGC 8400 

AAATGATACA GCTACCATAA GAAGGTATTG TGGAAGTTGT CCGTTACCAA CATAAAGCAT 8460 

TGTACCAGGG ATGATGGTGA TACCATTACC AGTACCAGCA AGTCCAAGGA TAGAAGCCAA 8520 

TCCACCACCG ATTGCACCAG CAATCAATGA AAGGAAGAAT GGTTTACGGA AGCGCAAGTT 8580 

CACCCCGAAG ATAGCAGGCT CTGTAATACC TAGGAAGGCA GAAAGAGCAG CCGGGAAAGC 8640 

AAGTGTTTTC AGTTTTGGAT TTTTTGTTTT AACACCAACC GCAACAGTAG CAGCACCTTG 8700 

AGCTGTCATA GCAGCTGTGA TGATAGCGTT GAATGGGTTA GCATGGTCAG CAGCAAGTAA 8760 

TTGCACTTCA AGCAAGTTGA AGATGTGGTG CACACCTGAC ACGACGATCA ATTGGTGAAC 8820 

CCCACCAATC AAGAAACCAC CAAGACCAAA TGGCATGCTA AGAATCGCTT TTGTAGCAAT 8880 

AAGGATGTAG TTTTCAACAA CGTGGAAAAC TGGTCCAATG ACAAAGAGTC CAAGGATAGA 8940 

CATGACCAAA AGTGTCACGA ATGGTGTTAC CAAGAGGTCA ATGACATCTG GAACAACTTG 9000 

CGGACAGCTT TTTCAAATTT AGCTCCGACA ACCCCGATGA TGAAGGCTGG AAGAACGGAA 9060 

CCTTGCAAAC CAACAACAGG GATGAAACCA AAGAAGTTCA TCGCTGTTAC TTCACCACCT 9120 

TGAGCAACTG CCCAAGCGTT TGGAAGTGAG CCAGAGACAA GCATCATACC AAGAACGATA 9180 

CCAACGGCAG GATTTCCACC AAATACACGG AAGGTTGACC ACACAACCAA ACCTGGCAAG 9240 

ATGATGAAGG CTGTATCTGT CAAGATTTGT GTGTAAGTTG CAAAGTCACC TGGAAGTGGC 9300 
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ATTTCAAGAG CGTTGAAAAG ACCACGCACA CCCATGAAGA GACCTGTCGC TACGATAACT 9360 

GGGATGATTG GAACGAAAAC ATCACCAAAA GTACGGATAG CACGTTGGAA CCAGTTCCCT 9420 

TGTTTAGCAA CTTCTGCTTT CATGTCATCC TTAGATGATG TTGGTAATCC AAGTACAACA 9480 

ACTTCATCGT ACATTTTGTT AACTGTACCT GTACCAAAGA TAATTTGGTA TTGCCCTGAG 9540 

TTAAAGAAAG CACCTTGAAC TTTTTCCAAG TTCTCAATCA CTTCTTTATT GATTTTCTCT 9600 

TCATCTTTGA CCATGACACG TAGACGAGTC GCACAGTGGG CAACACTATT GACATTTTCA 9660 

CGTCCGCCCA AGGCATCGAT GACTTTTTTT GCAATTTCCT GATTGTTCAT TTGCAAAAAT 9720 

CTCCTTATAT AACATTTTGT TCTTGTTTGA AAGCGATTTT ATTCGCCGG 9769 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3149 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

ATCACTTGGT CAGAAATAAT CAAGAAAAAA 60 

ACCAACTCAG ACTTTTTAAT TCTTAAAATG 120 

CCAAATCTTG GCCTGCATTA TTTTCACGCA 180 

GGAATCCTGT GACAAGTACT TCGGTCACGT 240 

GGGTACGCAA TTCTCCATCA ACGGAAATGA 300 

TTCTGCTAGT CTGCCCCATA GGACCATATT 360 

GTCTTTGTAA CGACGGTTCA CAGCGATAGT 420 

TTTGTGCAAT TCTGGTGTAG ACGTTAAACG 480 

TTCTTCCTCC TACTTATCTA TTCGTAGGAA 540 

TTCGAGAAAA TTTTTTATTT TTTATGAACC 600 

ATGGTCATAT CTGTAATCTG AACACGACGA 660 

GCAATATCCT GAGCTTGCAA AGCTTCTATT 720 

TCACCATGAA AACGCACTGT AGAAAAATCT 780 

ACCTTGATAT CCGTTGCGAT GGTATCAATT 840 

GCCTTGGTGG CTGAGTAAAC AGCTGCACCA 900 

ATATTGATAA TATGACCTTG ATTGGCTTTT 960 



CGCTTGAGTG CTAATTCATA GTTCTATTGT 
GTCTGACTTT CTCAAGATAA AAAGCCTGAG 
GCAATTCTTC CTCTTCCAAG ACCAAATCTG 
TAGCACGTTG GGCACGACTT TCCAAGAGTT 
AGTTCATTTG GCCATTTTTC TCAAAGCGAC 
GACTACCTTT GGTTGCGTAC TTGCCAAAGT 
GACAAAATCA GCTTCACGTT CACCGTTTTG 
TGCTCGCGCT ACCGACTTGT CATTGTTGGT 
TCCAATCAAG ATAACTTTAT TATACATATT 
ATCAAAAAAA GTTACAGAAA TTTGTAACTT 
ATGAAACCTG TCGCCTGTTG ATTGGCCATA 
GGTTGACTAG TCACATAGAC TACTGTATCT 
CCTTGGTAAA CGGACGCAGC TCGTTCTTTA 
GTTTCGACAA TTCCAGGCTG AATGGTCGTC 
CGCAGTCCAT CTGAAAAGGT CTTAACTGCC 
GCATAGGCAT AAATTCCTGC GGTTGACCCC 
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ACCATTGCTG GCAAGAAACA GCGAGTGACT GCCATCAAAC CTTTGACATT GGTATCCAAC 1020 

ATGGTCAGCA TATCCAACTC TTCATAGTCT TGATAGGGAG CTAAGCCAAG AGCCAGTCCT 1080 

GCGTTATTGA CCAGGATGTC AATCTGACCT ATCGTTTCTA AAATATCAGA GCAGACAGTC 1140 

TTTACCATTG TCATATCCGT GACATCTAGG AGAAAAGTCC AAACTGTTTG ATTTGGAAAA 1200 

GTTTCTGCAA ACTCCGCCTT AAGAGCTTCT AGTCTGTCTA TCCGTCGTCC TGTTAGAACG 1260 

ACATCCTCAC CCTGCTCCAG ATAAGCACGC GCAATCGCTT CACCGATTCC TGATGTCGCT 1320 

CCTGTAATCA CAACATTTTT TGCCATCTTA TTTCCTTCTA GCTGGTCTAT CAGATATTAA 1380 

CAACTTCTTA GGCAGTCCAG TGTTTCGCTG GGTCGAACGG TGTTCCGACA ACTTGGTCTT 1440 

CTGATAATTC AAGCACCCCA CGTTTTTGTG GAGCATTTGG CAGATGCAAT TCACGAGGAC 1500 

TGCACATCAT ACCAAAACTC TTTTCACCAC GAAGTTCACC TGGGAAAATG AGATTCCCTT 1560 

TTGGCATCAT AGCTCCAGGA AGCGCGACAA TGGTTTTCAA CCCCACACGC GCATTGGGAG 1620 

CTCCTGCAAC GATTTGTACA GTCTTATCAC TTGCGACTGC AACTTGGCAG ATGTTGAGGT 1680 

GGTCACTATC TGGATGGGCT ACCATCTCAA CAATTTCACC TACAACAAAC TTAGGTTCCT 1740 

TATCATTAAC AATTTCTTCT GTAAAACCTT CCGCCTGCAA CTCTTGGTTC AAACGAGCGA 1800 

CTTGCTCATC TGTCAAAAAG ACTTGACCGC GCTCTGCAAT TTCAAATAAA CTTGAAACTT 1860 

CGAAAATATT CCAAGCCACT GTTTCCCCAT TATCTTTGAG AAAAACACGG GCTACCTTGC 1920 

CTTTGCGCTC CACATCCAGT TTGGCATCTC CGCTATTTTT CACGATGACC ATAAGGACAT 1980 

CACCGACATG TTCTTTATTA TATGTAAAAA TCATTGTTTC CTTTTTCTCC TATTTCAGTC 2040 

CTGCTAAAAA GTCATTGATT TGTTGCTTGC TTTTACGGTC GCGATTGACA AAACGACCGA 2100 

TTTCCTTGTC CTTTTCTAGA ACAACAAGGC TAGGAATTCC GTAAACATCC CAGAGTTTGG 2160 

CCAAATCCAT ATACTGATCT CGGTCCATTC GAATAAAGGT GAACTCTGGA TTGGTCTCCT 2220 

CAATCTCTGG TAAGGCAGGA TAAATATAAC GACAATCGCT ACACCAGTCT GCCACAAAAA 2280 

TGAAGACCTT CTTGCCCGCT TTTTCCACTA AAGATGCTAA TTCTTCTAAA CTTGCTGGCT 2340 

GTATCATAAG ACTTCCTCCT CATAGACTAG GTCTTCATTT TCATAGACAA AGGTATAATG 2400 

ACGGCCATCC TCAAAAATGA CGCCACCAAC CAAGCTCTCC AGACTGCTTT CGTAAACTTG 2460 

AACATAAAGG GTCGCAATTT CCCCCATGTC GGAAAAATGG TCTCGCACAA TCTCTGTCAA 2520 

CTCTTCCTGA GTCTTCATGA GCTTACGGTC ATCTGCAACT TTTTTCGTAG CAAGAGCAAG 2580 

GCTTCCGATA CCTAGCAGAG CCAAGCCTGC CATCCACATT TTTTTAGCTT TCATACCATT 2640 

CATTTTAACA CAAAAAAGGC TTCAGGACAA ATGAGGAAGC AGCAGAAAAG CAAGTAAAAA 2700 



WO 98/18931 



PCT/US97/19588 



326 

GCCTCTTCCT TTAAGGAAAA GGACTTCTTA TACTCAATGA AAATCAAAGA CCAAACTAGG 



2760 



AAGCTAGCCG CAGGCTGCTC AAAGCACTGC TTTGAGGTTG TAGATAGAAC TGACGAgTCa 



2820 



CTCAAAACAC TGTTTTGAGG TTGTGGATGA AGCTGACGTG GTTTGAAGAG ATTTTCGAAG 



2880 



AGTATTATTC TTATTGCCAG GCACCTAAGT TGCCAACGTA GTAACTATCA GGTGTGTAGG 



2940 



TATTGCGAGC ATCTTACCTG ATGAAGCCAG ATAATACTAC TTGCCATTGT CTTTGACCCA 



3000 



ATCATTCGCA ATCATGGAAC CAGAAGAACT TACATAATAC CATTCTCCCT TGTCATAAAC 



3060 



CCAAGTACTG ACTTTCATGG TTCCTGAGCA ATTAAAGGCA AAAAAACTGT CCAATAACAT 



3120 



TCGTTTTTTA AAAGCATTTG ACACTACAT 



3149 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10240 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

CCAAAAATTC AACCTTTAAG GGGAGTCCAG AGAGACTCAC AAGGTGTCAG ATAAAAGAAT 60 

GGTGCAATTT TCTAGAGGAG ACTTTTTGAG TGTGCTCTCT TGTGTTGTAC GATTTTAACT 120 

GAGGCCTTGC ACTAGCAAGG TCTTTTCTTT ATCTGGTCCC CTTAAAATTT AAGGAGGAAA 180 

AGTTATGAAT CCCACATGTA AGAAGCGTTT GGGTGTCATT CGGTTGGAAA CCATGAAGGT 240 

GGTTGCACAA GAGGAAATCG CGCCACAATC TTTGAATTAG TCCTAGAAGG AGAAATGGTT 300 

GAAGCCATGC GAGCAGGCCA ATTTCTTCAT CTGCGTGTAC CGGACGATGC CCATCTCTTA 360 

CGTCGTCCTA TTTCAATTTC GTCTATTGAC AAGGCAAACA AGCAGTGTCA CCTCATTTAT 420 

CGGATTGACG GAGCTGGGAC TGCAATTTTT TCAACCTTAA GTCAGGGAGA CACTCTTGAT 480 

GTGATGGGGC CTCAGGGAAA TGGTTTTGAC TTGTCTGACC TTGATGAGCA GAATCAGGTT 540 

CTCCTTGTTG GTGGTGGGAT TGGTGTTCCA CCCTTGCTTG AGGTGGCCAA GGAATTGCAT 600 

GAACGTGGAG TGAAAGTAGT GACAGTCCTC GGTTTTGCTA ATAAGGATGC TGTTATTTTG 560 

AAAACGGAAT TGGCTCAGTA TGGTCAGGTC TTTGTAACGA CAGATGATGG TTCTTATGGC 720 

ATCAAGGGAA ATGTTTCCGT TGTTATCAAT GATTTAGACA GTCAGTTTGA TGCTGTTTAC 780 

TCGTGTGGGG CTCCAGGAAT GATGAAGTAT ATCAATCAAA CCTTTGATGA TCACCCAAGA 840 

GCCTATTTAT CTCTGGAATC TCGTATGGCT TGTGGGATGG GAGCTTGCTA TGCCTGTGTT 900 

CTAAAAGTAC CAGAAAACGA GACGGTCAGC CAACGCGTCT GTGAAGATGG TCCTGTTTTC 960 
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CGCACAGGAA CAGTTGTATT ATAAGGAGAA AATTATGACT ACAAATCGAT TACAAGTTTC 1020 

TCTACCTGGT TTGGATTTGA AAAATCCGAT TATTCCAGCA TCAGGCTGTT TTGGCTTTGG 1080 

ACAAGAGTAT GCCAAGTACT ATGATTTAGA CCTTTTAGGT TCTATTATGA TCAAGGCGAC 1140 

AACCCTTGAA CCACGTTTTG GGAATCCAAC TCCAAGAGTG GCAGAGACGC CTGCTGGTAT 1200 

GCTCAATGCA ATTGGCTTGC AAAATCCTGG TTTAGAGGTT GTTTTGGCTG AAAAGCTACC 1260 

TTGGCTGGAA AGAGAATATC CAAATCTTCC TATTATTGCC AATGTAGCTG GTTTTTCAAA 1320 

ACAAGAGTAT GCAGCTGTTT CTCATGGGAT TTCCAAGGCA ACTAATGTAA AAGCTATCGA 13 BO 

GCTCAATATT TCTTGTCCCA ATGTTGACCA CTGTAATCAT GGACTTTTGA TTGGTCAAGA 1440 

TCCAGATTTG GCTTATGATG TGGTGAAAGC AGCTGTGGAA GCCTCAGAAG TGCCAGTTTA 1500 

TGTCAAATTA ACCCCGAGTG TGACCGATAT CGTTACTGTC GCAAAAGCTG CAGAAGATGC 1560 

GGGAGCAAGT GGCTTGACCA TGATCAATAC TCTGGTTGGA ATGCGCTTTG ACCTCAAAAC 1620 

TAGAAAACCA ATCTTGGCCA ATGGAACAGG TGGAATGTCT GGTCCAGCAG TCTTTCCAGT 1680 

AGCCCTCAAA CTCATCCGCC AAGTTGCCCA AACAACAGAC CTGCCTATCA TTGGAATGGG 1740 

AGGAGTGGAT TCGGCTGAAG CTGCCCTAGA AATGTATCTG GCTGGGGCAT CTGCTATCGG 1800 

AGTTGGAACA GCTAACTTTA CCAATCCTTA TGCCTGCCCT GACATCATCG AAAATTTACC 1860 

AAAAGTCATG GATAAATACG GTATTAGCAG TCTGGAAGAA CTCCGTCAGG AAGTAAAAGA 1920 

GTCTCTGAGG TAAACTGCAA TCAATCTGTT CTTGATTTTT TATTAGTTTG TAATATGAAT 1980 

TTAGGAGAAT TTTGGTACAA TAAAATAAAT AAGAACAGAG GAAGAAGGTT AATGAAGAAA 2040 

GTAAGATTTA TTTTTTTAGC TCTGCTATTT TTCTTAGCTA GTCCAGAGGG TGCAATGGCT 2100 

AGTGATGGTA CTTGGCAAGG AAAACAGTAT CTGAAAGAAG ATGGCAGTCA AGCAGCAAAT 2160 

GAGTGGGTTT TTGATACTCA TTATCAATCT TGGTTCTATA TAAAAGCAGA TGCTAACTAT 2220 

GCTGAAAATG AATGGCTAAA GCAAGGTGAC GACTATTTTT ACCTCAAATC TGGTGGCTAT 2280 

ATGGCCAAAT CAGAATGGGT AGAAGACAAG GGAGCCTTTT ATTATCTTGA CCAAGATGGA 2340 

AAGATGAAAA GAAATGCTTG GGTAGGAACT TCCTATGTTG GTGCAACAGG TGCCAAAGTA 2400 

ATAGAAGACT GGGTCTATGA TTCTCAATAC GATGCTTGGT TTTATATCAA AGCAGATGGA 24 60 

CAGCACGCAG AGAAAGAATG GCTCCAAATT AAAGGGAAGG ACTATTATTT CAAATCCGGT 2520 

GGTTATCTAC TGACAAGTCA GTGGATTAAT CAAGCTTATG TGAATGCTAG TGGTGCCAAA 2580 

GTACAGCAAG GTTGGCTTTT TGACAAACAA TACCAATCTT GGTTTTACAT CAAAGAAAAT 2640 

GGAAACTATG CTGATAAAGA ATGGATTTTC GAGAATGGTC ACTATTATTA TCTAAAATCC 2700 
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GGTGGyTACA TGGCAGCCAA TGAATGGATT TGGGATAAGG AATCTTGGTT TTATCTCAAA 2760 

TyTGATGGGA AAATrGCTGA AAAAGAATGG GTCTACGATT CTCATAGTCA AGCTTGGTAC 2820 

TACTTCAAAT CCGGTGGTTA CATGACAGCC AATGAATGGA TTTGGGATAA GGAATCTTGG 2880 

TTTTACCTCA AATCTGATGG GAAAATAGCT GAAAAAGAAT GGGTCTACGA TTCTCATAGT 2940 

CAAGCTTGGT ACTACTTCAA ATCTGGTGGC TACATGGCGA AAAATGAGAC AGTAGATGGT 3000 

TATCAGCTTG GAAGCGATGG TAAATGGCTT GGAGGAAAAA CTACAAATGA AAATGCTGCT 3060 

TACTATCAAG TAGTGCCTGT TACAGCCAAT GTTTATGATT CAGATGGTGA AAAGCTTTCC 3120 

TATATATCGC AAGGTAGTGT CGTATGGCTA GATAAGGATA GAAAAAGTGA TGACAAGCGC 3180 

TTGGCTATTA CTATTTCTGG TTTGTCAGGC TATATGAAAA CAGAAGATTT ACAAGCGCTA 3240 

GATGCTAGTA AGGACTTTAT CCCTTATTAT GAGAGTGATG GCCACCGTTT TTATCACTAT 3300 

GTGGCTCAGA ATGCTAGTAT CCCAGTAGCT TCTCATCTTT CTGATATGGA AGTAGGCAAG 3360 

AAATATTATT CGGCAGATGG CCTGCATTTT GATGGTTTTA AGCTTGAGAA TCCCTTCCTT 3420 

TTCAAAGATT TAACAGAGGC TACAAACTAC AGTGCTGAAG AATTGGATAA GGTATTTAGT 3480 

TTGCTAAACA TTAACAATAG CCTTTTGGAG AACAAGGGCG CTACTTTTAA GGAAGCCGAA 3540 

GAACATTACC ATATCAATGC TCTTTATCTC CTTGCCCATA GTGCCCTAGA AAGTAACTGG 3600 

GGAAGAAGTA AAATTGCCAA AGATAAGAAT AATTTCTTTG GCATTACAGC CTATGATACG 3660 

ACCCCTTACC TTTCTGCTAA GACATTTGAT GATGTGGATA AGGGAATTTT AGGTGCAACC 3720 

AAGTGGATTA AGGAAAATTA TATCGATAGG GGAAGAACTT TCCTTGGAAA CAAGGCTTCT 3780 

GGTATGAATG TGGAATATGC TTCAGACCCT TATTGGGGCG AAAAAATTGC TAGTGTGATG 3840 

ATGAAAATCA ATGAGAAGCT AGGTGGCAAA GATTAGTACT ATAAGTGAAT ATGATTTGAG 3900 

TGAATAGTAA GTTAAAAATC CTGATTTCAA GTAAAATCAG GATTTTTTCA TGGATGCAAT 3960 

TTTTTTGGAG TCTGGTGTGA CGCGGAGGGT CTTTTGTCCT GTGTAAGTGA CAAAGCCGGG 4020 

TTTTCCACCA GTTGGTTTAT TGAGTTTTTT GACTTCAATC ATATCTACCT GCACCAGATT 4080 

CGACAGGCGC CCTTGAGAGA AGTAGGCAGC TAACTCTGCT GCGTCTGTCT TGACTGCATC 4140 

AGATGGGTCA AGATTTCCTG AGATGACAAC ATGGCTTCCA GGAATGTCCT TAGCATGGAA 4200 

CCAAAGTTCC TCCTTGCGGG CCATTTTAAA GGTCAATTCG TCATTTTGAA GATTGTTTCG 4260 

TCCGACATAG ATGATGGTTT TGCCATCGCT TGCTAGATAT TGTTCTAGTT TTTTGCGTTT 4320 

CTGGATTTTC TCCCGTTGTC TTCTGCGGAT AAAACCTGTT TGAATCAATT CTTCACGGAT 4380 

TTCAGCGATT TCTTCCAGTC CAGCTTGGTT GAGGACGGTT TCTACACTTT CCAGATAGAG 4440 

AATAGTGGCT TTGGTTTCTT CAATCAAATC AGTCAAGTAT TTGACAGCTT CTTTGAGTTT 4500 
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CTGATACCGT TTAAAATAGC GTTGGGCATT CTGGTTGGGA GTCAGAGCCT TATCAAGCGC 4560 

AATCATGATA GGTTGGTTGG TATAGTAGTT GTCTAGGATA ACCTGGTCTT GGTCGTTAGG 4620 

CACTTGGTGG AGGAAGGTTG TCAGCAATTC TCCTTTTTGA CGAAATTCTT CAGCGTTGTC 4680 

, TGTCGCCAGT AACTCTTTTT CCTGTTTTTT GAGTTTGTGT CGGTTTTTCT GAAGTTCATT 4740 

TTCAACACGA CGAATCAGTT CACTGGCCTG CTGTTTGACG CGGTCGCGCT CAGCCTTATC 4800 

CTTATAGTAG GTGTCCAACA AATCAGAAAG ATTTGCAAAA GGCTCTCCCA CCTGATTTGC 4860 

AAAAGGAACT GGACTGAAGG AAGTCTCAGT CAAGCATGGC TTGGTTTCTT GATTGAAAAA 4920 

ATTTCGGAAA GCGGAAAGTT TTTCACTAAC CAGTATCCTT TCCAATTCAT TTGCCGTATC 4980 

GCGTCCCAGA CCTTGAAAGA GGCTTTGAAG ATTTTTTGCT GTTAGTTCTT GGGTTTGCAG 5040 

GATTTCAAAG AGCTTTTCAT CCTTGATAGT AAAAGGATTG AGAGATTTTG TACTTGGCGG 5100 

AGCGATATAG GTCGATCCTG GAAGTAAGGT GCGGTAGCTA TTTTGTGAAA AGCCGACGTG 5160 

TTTGATAACT TCGAGGATTT TATGACTGCT TTTATCGACC AGTAGAATAT TACTGTGTTT 5220 

CCCCATAATT TCGATAATCA AGGTAGCCTG GATATGGTCT CCAATCTCGT TTTTATTGGA 5280 

AACTGTAATT TCCACAATAC GGTCATTTTC CACTTGCTCA ATCGACTCAA TCAGGGCCCC 5340 

CTGCAAATAC TTTCTCAAAA CCATGATAAA GGTAGAAGGT TGAGCTGGAT TTTCAAAAGT 5400 

CGTTTGGGTC AGCTGAATGC GTCCAAAAAC TGGATGGGCA GAAAGGAGCA GGCGATGGCT ! 5460 

TTGGCGATTG CTGCGGATTT GCAAGACCAA CTCTTGTTCA AAAGGCTGAT TGATTTTCTG 5520 

GATGCGACCA TTCACTAATT CGCTTCGCAA TTCCTCAACT ATGTGGTGTA AAAAAAATCC 5580 

GTCAAATGAC ATCGTTCTCT CCTTGTGATT GTATTCCATA GTATTATATC AAAAAGGTAG 5640 

AATAAAATCA TGGAAATGTG GTATAATAAA GCCAAGTAAA GAGAAACGAG AAGCACATGT 5700 

ATATTGAAAT GGTAGATGAA ACTGGTCAAG TTTCAAAAGA AATGTTGCAA CAAACCCAAG 5760 

AAATTTTGGA ATTTGCAGCC CAAAAATTAG GAAAAGAAGA CAAGGAGATG GCAGTCACTT 5820 

TTGTGACCAA TGAGCGTAGT CATGAACTTA ATCTGGAGTA CCGTAACACC GACCGTCCGA 5880 

CAGATGTCAT CAGCCTTGAG TATAAACCAG AATTGGAAAT TGCCTTTGAC GAAGAGGATT 5940 

TGCTTGAAAA TTCAGAATTG GCAGAGATGA TGTCTGAGTT TGATGCCTAT ATTGGGGAAT 6000 

TGTTCATCTC TATCGATAAG GCTCATGAGC AGGCCGAAGA ATATGGTCAC AGCTTTGAGC 6060 

GTGAGATGGG CTTCTTGGCA GTACACGGCT TTTTACATAT TAACGGCTAT GATCACTACA 6120 

CTCCGGAAGA AGAAGCGGAG ATGTTCGGTT TACAAGAAGA AATTTTGACA GCCTATGGAC 6180 

TCACAAGACA ATAAACGAAA ATGGAAAAAT CGTGACTTGA TATCCAGTTT AGAATTTGCT 6240 
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TTGACAGGTA 


TTTTTACTGC 


TATCAAGGAA 


GAACGCAATA 


TGCGAAAACA 


CGCAGTGACG 


6300 


GCTCTAGTGG 


TCATCCTTGC 


AGGTTTTGTT 


TTTCAGGTGT 


CACGAATCGA 


ATGGCTCTTT 


6360 


CTCCTATTGA 


GTATTTTCTT 


GGTAGTAGCC 


TTTGAGATTA 


TCAACTCTGC 


TATTGAAAAT 


6420 


GTGGTGGATT 


TGGCCAGTCA 


CTATCACTTT 


TCCATGCTGG 


CTAAAAATGC 


CAAGGATATG 


6480 


GCGGCCGGCG 


CGGTATTAGT 


GGTTTCTCTT 


TTCGCAGCCT 


TAACAGGCGC 


ATTGATTTTT 


6540 


CTCCCACGAA 


TCTGGGATTT 


ATTATTTTAA 


ACAGTAAGAG 


GAAATTATGA 


CTTTTAAATC 


6600 


AGGCTTTGTA 


GCCATTTTAG 


GACGTCCCAA 


TGTTGGGAAG 


TCAACCTTTT 


TAAATCACGT 


6660 


TATGGGGCAA 


AAGATTGCCA 


TCATGAGTGA 


CAAGGCGCAG 


ACAACGCGCA 


ATAAAATCAT 


6720 


GGGAATTTAC 


ACGACTGATA 


AGGAGCAAAT 


TGTCTTTATC 


GACACACCAG 


GGATTPAr'AA 


o / o V 


GCCTAAAACA 


GCTCTCGGAG 


ATTTCATGGT 


TGAGTCTGCC 


TACAGTACCC 




6840 


GGACACTGTT 


CTTTTCATGG 


TGCCTGCTGA 


TGAAGCGCGT 


GGTAAGGGGG 


APn AT ATHI AT* 




TATCGAGCGT 


CTCAAGGCTG 


CCAAGGTTCC 


1 Wi 1 1 A X VJ 


Iw l\Mxn 1 M. 


a a ATWia'paa 


DjDU 


GGTCCATCCA 


GACCAGCTCT 


TGTCTCAGAT 


TGATGACTTC 


fT2TAATPAAA 


T^naoTTTa a 

1 ou/V_ Ill ftft 


/ u«u 


GGAAATTGTT 


CCAATCTCAG 


CCCTTCAGGG 


AAATAACGTG 


TCTCGTCTAG 


TYIft AT* A TTTT 


7080 


GAGTGAAAAT 


CTGGATGAAG 


GTTTCCAATA 


TTTCCCGTCT 


GATCAAATCA 




7140 


AGAACGTTTC 


TTGGTTTCAG 


AAATGGTTCG 


CGAGAAAGTC 


TTGCACCTAA 


CTCGTGAAGA 


7200 


GATTCCGCAT 


TCTGTAGCAG 


TAGTTGTTGA 


CTCTATGAAA 


CGAGACGAAG 


AGACAGACAA 


7260 


GGTTCACATC 


CGTGCAACCA 


TCATGGTCGA 


GCGCGATAGC 


CAAAAAGGGA 


TTATCATCGG 


7320 


TAAAGGTGGC 


GCTATGCTTA 


AGAAAATCGG 


TAGCATGGCC 


CGTCGTGATA 


TCGAACTCAT 


7380 


GCTAGGAGAC 


AAGGTCTTCC 


TAGAAACCTG 


GGTCAAGGTC 


AAGAAAAACT 


GGCGCGATAA 


7440 


AAAGCTAGAT 


TTGGCTGACT 


TTGGCTATAA 


TGAAAGAGAA 


TACTAAGTAG 


AGGTAGGCTC 


7500 


ATGCCTGCTT 


CTTGTTTTTA 


CAGAAGGAGG 


ACTTATGCCT 


GAATTACCTG 


AGGTTGAAAC 


7560 


CGTTTGTCGT 


GGCTTAGAAA 


AATTGATTAT 


AGGAAAGAAG 


ATTTCGAGTA 


TAGAAATTCG 


7620 


CTACCCCAAG ATGATTAAGA CGGATTTGGA AGAGTTTCAA AGGGAATTGC CTAGTCAGAT 


7680 


TATCGAGTCA 


ATGGGACGTC 


GTGGAAAATA 


TTTGCTTTTT 


TATCTGACAG 


ACAAGGTCTT 


7740 


GATTTCCCAT 


TTGCGGATGG 


AGGGCAAGTA 


TTTTTACTAT 


CCAGACCAAG 


GACCTGAACG 


7800 


CAAGCATGCC 


CATGTTTTCT 


TTCATTTTGA 


AGATGGTGGC 


ACGCTTGTTT 


ATGAGGATGT 


7860 


TCGCAAGTTT 


GGAACCATGG 


AACTCTTGGT 


GCCTGACCTT 


TTAGACGTCT 


ACTTTATTTC 


7920 


TAAAAAATTA 


GGTCCTGAAC 


CAAGCGAACA 


AGACTTTGAT 


TTACAGGTCT 


TTCAATCTGC 


7980 


CCTTGCCAAG 


TCCAAAAAGC 


CTATCAAATC 


CCATCTCCTA 


GACCAGACCT 


TGGTAGCTGG 


8040 
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ACTTGGCAAT ATCTATGTGG ATGAGGTTCT CTGGCGAGCT CAGGTTCATC CAGCTAGACC 8100 

TTCCCAGACT TTGACAGCAG AAGAAGCGAC TGCCATTCAT GACCAGACCA TTGCTGTTTT 8160 

GGGCCAGGCT GTTGAAAAAG GTGGCTCCAC CATTCGGACT TATACCAATG CCTTTGGGGA 8220 

AGATGGAAGC ATGCAGGACT TTCATCAGGT CTATGATAAG ACTGGTCAAG AATGTGTACG 8280 

CTGTGGTACC ATCATTGAGA AAATTCAACT AGGCGGACGT GGAACCCACT TTTGTCCAAA 8340 

CTGTCAAAGG AGGGACTGAT GGGAAAAATC ATCGGAATCA CTGGGGGAAT TGCCTCTGGT 8400 

AAGTCAACTG TGACAAATTT TCTAAGACAG CAAGGCTTTC AAGTAGTGGA TGCCGACGCA 84 60 

GTCGTCCACC AACTACAGAA ACCTGGTGGT CGTCTGTTTG AGGCTCTAGT ACAGCACTTT 8520 

GGGCAAGAAA TCATTCTTGA AAACGGAGAA CTCAATCGCC CTCTCCTAGC TAGTCTCATC 8580 

TTTTCAAATC CTGATGAACG AGAATGGTCT AAGCAAATTC AAGGGGAGAT TATCCGTGAG 8640 

GAACTGGCTA CTTTGAGAGA ACAGTTGGCT CAGACAGAAG AGATTTTCTT CATGGATATT 8700 

CCCCTACTTT TTGAGCAGGA CTACAGCGAT TGGTTTGCTG AGACTTGGTT GGTCTATGTG 8760 

GACCGAGATG CCCAAGTGGA ACGCTTAATG AAAAGGGACC AGTTGTCCAA AGATGAAGCT 8820 

GAGTCTCGTC TGGCAGCCCA GTGGCCTTTA GAAAAAAAGA AAGATTTGGC CAGCCAGGTT 8880 

CTTGATAATA ATGGCAATCA GAACCAGCTT CTTAATCAAG TGCATATCCT TCTTGAGGGA 8940 

GGTAGGCAAG ATGACAGAGA TTAACTGGAA GGATAATCTG CGCATTGCCT GGTTTGGTAA 9000 

TTTTCTGACA GGAGCCAGTA TTTCTTTGGT TGTACCTTTT ATGCCCATCT TCGTGGAAAA 9060 

TCTAGGTGTA GGGAGTCAGC AAGTCGCTTT TTATGCAGGC TTAGCAATTT CTGTCTCTGC 9120 

TATTTCCGCG GCGCTCTTTT CTCCTATTTG GGGTATTCTT GCTGACAAAT ACGGCCGAAA 9180 

ACCCATGATG ATTCGGGCAG GTCTTGCTAT GACTATCACT ATGGGAGGCT TGGCCTTTGT 9240 

CCCAAATATC TATTGGTTAA TCTTTCTTCG TTTACTAAAC GGTGTATTTG CAGGTTTTGT 9300 

TCCTAATGCA ACGGCACTGA TAGCCAGTCA GGTTCCAAAG GAGAAATCAG GCTCTGCCTT 9360 

AGGTACTTTG TCTACAGGCG TAGTTGCAGG TACTCTAACT GGTCCCTTTA TTGGTGGCTT 9420 

TATCGCAGAA TTATTTGGCA TTCGTACAGT TTTCTTACTG GTTGGTAGTT TTCTATTTTT 9480 

AGCTGCTATT TTGACTATTT GCTTTATCAA GGAAGATTTT CAACCAGTAG CCAAGGAAAA 9540 

GGCTATTCCA ACAAAGGAAT TATTTACCTC GGTTAAATAT CCCTATCTTT TGCTCAATCT 9600 

CTTTTTAACC AGTTTTGTCA TCCAATTTTC AGCTCAATCG ATTGGCCCTA TTTTGGCTCT 9660 

TTATGTACGC GACTTAGGGC AGACAGAGAA TCTTCTTTTT GTCTCTGGTT TGATTGTGTC 9720 

CAGTATGGGC TTTTCCAGCA TGATGAGTGC AGGAGTCATG GGCAAGCTAG GTGACAAGGT 9780 
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GGGCAATCAT CGTCTCTTGG TTGTCGCCCA GTTTTATTCA GTCATCATCT ATCTCCTCTG 9840 

TGCCAATGCC TCTAGCCCCC TTCAACTAGG ACTCTATCGT TTCCTCTTTG GATTGGGAAC 9900 

CGGTGCCTTG ATTCCCGGGG TTAATGCCCT ACTCAGCAAA ATGACTCCCA AAGCCGGCAT 9960 

TTCGAGGGTC TTTGCCTTCA ATCAGGTATT CTTTTATCTG GGAGGTGTTG TTGGTCCCAT 10020 

GGCAGGTTCT GCAGTAGCAG GTCAATTTGG CTACCATGCT GTCTTTTATG CGACAAGCCT 10080 

TTGTGTTGCC TTTAGTTGTC TCTTTAACCT GATTCAATTT CGAACATTAT TAAAAGTAAA 10140 

GGAAATCTAG TGCGAGTAAA AATCAATCTC AAATGCTCCT CTTGTGGCAG TATCAATTAC 10200 

CTAACCAGTA AAAATTCAAA AACCCATCCA GACAgATTGA 10240 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13206 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 33: 

CGCTTTATCG TGGACGTGGT CAAGCCGAGA ATTTCATCAA GGAGATGAAG GAGGGATTTT 60 

TTGGCGATAA AACGGATAGT TCAACCTTAA TCAAAAACGA AGTTCGTATG ATGATGAGCT 120 

GTATCGCCTA CAATCTCTAT CTTTTTCTCA AACATCTAGC TGGAGGTGAC TTCCAAACTT 180 

TAACAATCAA ACGCTTCCGC CATCTTTTTC TTCACGTGGT GGGAAAATGT GTTCGAACAG 240 

GACGCAAGCA GCTCCTCAAA TTGTCTAGTC TCTATGCCTA TTCCGAATTG TTTTCAGCAC 300 

TTTATTCTAG GATTAGAAAA GTCAACCTGA ATCTTCCTGT TCCTTATGAA CCACCTAGAA 360 

GAAAAGCGTC GTTAATGATG CATTAAAGAA CAGTCGAGAT GAAAAAATCG TGTGACGCAC 420 

CAAGGGAGGA GTCTGCCCTT TTGAGGAAAT CTAGCGAGGA AAAACGATAC TGGAACAGCA 480 

GAAAGTAAAA CTGACCTCAT GAGGAGGAAG AAAGTGGCTC ATGAGGTCAG GGGTTTTGVA 540 

— AGTTACATCT AGTTGAGAGA GGTATGAATG ATTTGGGATT AATCATTTCT TGTTTTAAAT 600 

CAGGAGAATA GTAACGATTT TTTCCTTTTT TGACGAACTC TATTCCGTAA CGATCAATCA 660 

ATTTAATCAT GTACCTAATA TTAGAATTGT TTATCCCAAA TTTATTTGAA AGCTTCTCTA 720 

AGCTATATCC TTGTTTTCTA AGTTCATAGA TCTGAACTTT ATCATCATAA GTTAGTTTCA 780 

TAATAAAAAC ACCCCAAAAG TTAGATTTTT TCTGTCTAAC TTTTGGGGGG CAGTTCATTC 840 

AACACCTGAT ACTATGCGTT TTTCTTATTT GAAATACTTT TTACTCAACC TCTTTATACT 900 

CAATGAAAAT CAAAGTGCAA ACTAGAAAGC TAGCCTCAGG CTGCTCAAAA CAGTGTTTTG 960 
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AGGTTGCAGA TGGAAGCTGA CGTGGTTTGA AGAGATTTTC GAAGAGTATT ACTTAATCTT 1020 

CTTGATACTT TGACTAAGAA TAAATCCTAC AATCATCCCT ACCATATTTT GCATAAAATT 1080 

CGGTAGAATT TCTGGGAGGG CTGCTGCCCA GCCATTCATC AAAGCAGAAC CCAAGGCGTA 1140 

GCCTCCTACC ATGGCAATAG TTGCTAAAAT AAGGCCTAAC CACTGACTTT TTCCTTTAAA 1200 

TCCTGCGAAA AATCCCTGCA AGCCATGGTT GACCAAGCTA AAGAACATCC ACTGAGGGTA 1260 

GCCTGATAAG AGGTCAATCA AGAAACTTGC TAGTCCTCCG ACTACCGCTC CTTCACGACT 1320 

ACCAAAGTAA AAGGCCGCAA AGAAGACACC AGCATCTAAA AGAGTTAGAA TTCCTGTAGG 1380 

TGTTGGGATT TTTAAGAAAT AACCTAGAAC CACAGAAAGG GCGGTTAATA GGGATACAAG 1440 

GGCGATTTTA GTTGTTTTTG TTTGCTTCAT ATTGTCTTAC TCCATACTGA TCTGCTTGTG 1500 

CAATAGCACG ATAAACGAAA GCCTTAGAGC TTTCTACTGC TGGCAAAAGT TTATCACCTT 1560 

TAACCAGGTG ACTGGCAATG CTAGAGsCAA AGGTACAACs TGCACCAGCA TTTTGGCCTT 1620 

GGATAACTGG ATTTTCTAGG ATAGTAAAGG TCTGTCCATC ATAAAAGACA TCCACAGCCT 1680 

TGTCCTGACT AAGACGATTG CCTCCCTTGA TAATGACTGt GGCGCTCCTA AATCATGCAA 1740 

TTTCTGCGCT GCAGTTTTCA TGTCTTCCAA GGTTTTAATT TCCTGACCGG ATAATAATTC 1800 

TGCTTCTGGG AGATTAGGCG TAATCACACT GACATAAGGG AAAAAGCGAA TCAACTCTTG 1860 

GCAGAGCTCA CTGACAGCTA CATCATGCGT TTCCTTGCAG ACCAAGACAG GATCCAACAC 1920 

CACAGGTACT CCTGGGCGTT GTTTGATAAA GTCCAAGGCC TTCTCAGCCA CGCTGACAGT 1980 

AGGGAGAAGA CCAATCTTAA TTCCCCCAAA TTCCACATCA <X3CAAGCTAT CTAATTCATG 2040 

TTGAAAAATG GTATCATCAG TTGGAAAGAC TTCAAATCCT TTTTCTGTCA AGGCTGTCAA 2100 

ACAAGTCACT GCTACAAACC CATGCAAGCC GTTCAAGGTA TAGGTAGCCA AATCAGCTGA 2160 

CAGTCCACCA CCACTAAAAA TATCATTTCC AGAAAGTGCT AAAATACGAT TATTCTTCAT 2220 

AACGAATCTC CTTTAAATAC AAACCATTTG GTGCTGCAGT GGGACCTGCA AGTTGCCTGT 2280 

CCTTCTTCTC CAAGATGAGA TCAATCTGCT CTACTGGCAT GCGGTTGTTA CCGATTTTGA 2340 

GAAGAGTCCC CACCATATTG CGAATCTGTT TATACAAGAA ACCATTTCCT GAAAAGGTAA 2400 

AGGTCAAAAA TTGTCCTGTC TCATCGACTA TTAAACTAGC TTCTGTGATG GTGCGAACCT 24 60 

TATCCTCTAC ACTAGTCCCA GAGGCTGTAA AACCGGTAAA ATCATGGGTT CCCTCTAGCT 2520 

TTTTGATTGC AATCTGCATT CGTTCCACAT CGAGTGGGTA GGGAAAGTGG GTGGCATAGT 2580 

GACGGCGCAT CGGATTTTTG GGACGTCCTC TATCCACAGT AAACTCATAG GTCTTGCTAT 2640 

GCTTGGCATA ACGGCAATGA AAATCATCTG CCACAAGCTC AATCGAAATC ACATCAATAT 2700 
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CTTCAGGAGA CTGGGTATCC AAGGCAAAAC GGAGTTTCTC CTCATCCATC TGATAAGGCA 2760 

GGTCAAAATG AATCACCTGT CCCAGGGCAT GAACCCCACT ATCTGTCCTA CCAGCACCGT 2820 

GAACAGTAAT GGCTTGCCCT TTATTTAATC TGGTCAAGGT TTTTTCAATT TCTTCCTGAA 2880 

CGCTACGCGC ATGAGGCTGG CGCTGAAAGC CAGCAAAGGC ATAACCATCA TAGGAAATAG 2940 

TTGCTTTATA TCTCGTCATA GCCTCTATTT TATCAAGAAA TTAGTCTGTA AACAAGGACC 3000 

TAAAACAAAT ATTGTATGGG TATAAAAATC TCATACTCTT CGAAAATCTC TTCAAACCAC 3060 

GTCAGTTTCC ATCTGCAACC TCAACACACT ATTTTGAGCA ACCTGCGGCT AGCTTTCTAT 3120 

AGTAGATTGA AATAAGATAT GAACAACTCT ATTAGGAAAG TCAAATTAAT TTCTAGAAAT 3180 

ATTTTAGCAG CTACAGCGTA CTATTCCAAA CTCAATCAAC TATAGTTTGC TCTTTGATTT 3240 

TCATTGAGTA TCAAAAGAAA AACTTAGGAA TCAATCCTAA GCTCTCTTCT GAAGTAGGTA 3300 

CATGACAAAG ATAGAGATTA CAATCAACCA ACCTCCTAAG ATACTAAAGA CCAACATCCC 3360 

ATTGTGAGTT AGTAAGCCAA TTGCACCTAG AACGAATGGG GTCGTAAAGG CTCCGAAACT .3420 

ACAGCCTAAT ACAGCAAATG AAGTTGCTTG ATTGAGGAGT TTAGCTGGAA TTCGTTCAGA 3480 

GACAAGTTGA AAGACCGTCG TCAAGACTAC ACTATAGGCA AATCCAGCCA GAACACTTCC 3540 

TGCTACTACC ACCCACAAGG ATGAAGACAA GGCAATCACG ATTTGCCCCA AGCCAAAGGT 3600 

AATACCAGAC CAGAGGAGCA GTTTCTCTTT AAAGATAGAA ATCAAGAAAG AAAAACTCAC 3660 

CCCAGCCACA ATCCCGATCA ACTGCATGAT ACTAAGAACA AAACTAGATA ACTGGGCATC 3720 

CCCCAATCCT CTTTCCACCA TCAAACTTGG AATACGGATG GTAATAGCTG TATTGGTACA 3780 

AACTACAACT GCCGCTTCGA TAGCTAAGGT AAAAATCAAG CCTTTCATTT CTCGAGTTAA 3840 

ACGACTTGCT TCCTTCGCTC TTTTCTTGAC TTCTTTCTTT GATTTTCCAT AAGGGACAAA 3900 

GAGCAGATAA AGGGGCAGCA CCAAAAATCC AGCACTATAG GCTAGAAAGA TAGCTGTCCA 3960 

ACCAAAGGCC AACAACTGAC CGACGGCCAA GGTAATGAGA GAAGCTCCAA CGACCTCTGC 4020 

AGAAGCGCGT AGCCCTAACA TCTGAATTCG CCTTTTTCCT TGGTAGCGTT CACTGATAAT 40B0 

AGAAATGGCC TTGGCATTGA TCATCCCAAG ACCCAAACCA AAGAGAAGCC GTGTTCCAAA 4140 

GACAAAGGGA TAGGCTTGGT ACCAGAAGGG AGCTGTACCG CTCAATGATA AAATCAGCAA 4200 

GCCCAAACTA ATCTGTAAGC GCTCAGGAAA TATTTTTTCT AAGAAACCAT TTAGCAGTAA 4260 

CATCATCATG ATTCCAAAGG AAGGCAAGCT CACCAAGAGC TCAATTTGTT CCTTAGAATA 4320 

ACCCTGATAA TAGTCAAACA TGGCTGGTAG GGCACTCGAA ATGGAAAAGG AGGTAATCAA 4380 

AACGAGGGAG AGAGCCAAAA TGCTGGCCCG TTCTAAAAAT TGTTTCATGA AATCTCTTTC 4440 

TATATTTCTC TTAATCTTCT ACTTTTTTGA TAGTTATCAA ATAAGCAAGA AAAGAAGAAG 4500 
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CCTCATTGGT TTGTAGACTC CTTCTTAAAT TCGAAAATGA ATCCCTTGTA TCTTATACTC 4560 

AATGAAAATC AAAGAGCAAA CTAGGAAGCT AGCCGCAGGT TGTTCAAAAC AGTGTTTTGA 4620 

GGTTGCAGAT GGAAACTGAC GTGGTTTGAA GAGATTTTCG AAGAGTATTA GGATGACTTT 4 680 

CTCTTGATTT GCTTGATAAA GTAGAAAATA AATCCTGCTA CCATATAGGC AACAAAGATA 4740 

ATCAGACACC ACTTAAACAC AACATTCCAA CCCTTGTTCA CATTCAAAAA GAAGTAAGGG 4800 

AAAGGATTAT CCTTGGCATT TGGAATATTG AGTTTTAGAA CCAAGCCATT AAAAAGAGCA 4860 

AACATCATAT ACAGAAAGGG TAAAATGGTC CACACTGCTG GATCCCAAAT CTTGTATTGA 4920 

CCCTGTTTGT CAAAAAAGAG GGTATCCGCT AAAAACCAGA TGGGAACGAT ATAGTGGCAA 4980 

AGGAAATTTT CTAGGGTATA GAAATTAGTC GCAATGGGCG CCAAGAGGAA ATGGTAAATC 5040 

ACACAGGTAA TCATGATACT CATGGTGACC CCACCTTTTA AGCGCAAGAG ACTTGGCCTT 5100 

TGCCAATTTT CACCTACACG GCTCATAACC TTTAGAAGAT AAAGGGTAAA AATAGTTACC 5160 

AAGAGGTTGG ACAGAACCGT GTAATAGAGA AGCATCCCAA AACCACCATG CTTAGTAATT 5220 

TCAAGATAAA CTCCCGTAAA AGCCGCTAGA AACAAGAAGA TACGGCTATA AAATACAAGT 5280 

TTATAGTGTT TTGACATGCT TAAATCTTCC TCACAAACTC TGATTTAAGT TTCATGGCAC 5340 

CAAAACCATC AATCTTACAG TCGATATTGT GGTCGCCTTC TACGATGCGG ATATTTTTCA 5400 

CGCGCGTCCC TTGTTTCAAA TCTTTTGGCG CACCTTTTAC TTTCAAGTCC TTGATGAGAG 5460 

TTACTGTATC ACCATCAGCC AATTTATTTC CGTTGGCATC GATAGCGACA AGACCTTCTT 5520 

CTACTTCTGC AACTTCAGCA GGATTCCACT CATGAGCACA CTCTGGGCAA ACCAGTAGGG 5580 

CACCGTCTTC GTAGACATAC TCTGAGTTAC ATTTTGGACA ATTTGGTAAA TTGTTCATGG 5640 

TTTCTCCTTA TCATCATTCA CTATTCTTTG AAAATCAAAA TTTCTCGAAC AGCAACTATT 5700 

ATACCCTAAA ATCAGCATTT TGACAAATTT AGAAAAAAAC CGATATCAAT CTATCGGCTT 5760 

TTCTACATTT ACATTCTTTT TTCAGCTTCT GCTTTGATTT TTTCAACTAC TTCTTGAATG 5820 

TTCAAACCAG TTGTATCAAG GTAGACAGCA TCCTCTGCTT GTTTGAGAGG AGAAGTCTCA 5880 

CGATGACTAT CCTTGTAGTC ACGCGCAGCA ATTTCCTTTT TTAGGGTTTC AAGGTCTGTT 5940 

TCAATTCCCT TGGCAATATT TTCCTTGTAA CGACGCTCTG CTCTCTCATC AACAGAAGCT 6000 

ACTAGGAAAA TTTTCAATTC TGCTTGTGGC AATACAACAG TTCCAATATC GCGACCATCC 6060 

ATGACAATCC CGCCTTGCTG GGCAATTTCT TGTTGGAGAG AAACCAGTTT CTCACGCACT 6120 

TGAGGAATTG CTGCAATAGC AGAAACATGA TTGGTCACTT CATTTTCACG GATAGGATGG 6180 

GTAATATCCA CATCTCCTAC AAAAACAAGC TGGTCTCCAG TTTCTGAACG TCCAAAGCTG 6240 
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ATTGGATGCT GGTCCAACAA GGCTAGAAGG GCTTCGACTT CTTCAACTCC TAATTGGTTC 6300 

TTAAGAGCCA TATAGGTCGC TGCACGATAC ATAGCTCCTG TATCAAGGTA GGTGAATCCA 6360 

AAATCCTTAG CAATAATCTT TGCGACCGTA CTCTTACCGC TGGAAGCAGG ACCATCAATA 6420 

GCAATTTGAA TTGTTTTCAT ATCGGCTCCT ATTTTATTTT TATAACATCA CCTGGATTAG 64 80 

CAAACCAAGA TCCTGTAGCC ATGTGCCCAG GATTCAAGGC CTCTAACTGA GCAATGGAGA 6540 

TTCCTGCACG AGCGGCAATA GCTGCTTCCC CTTCTCCTGC GAGAACTTTA ATCGTTCCTT 6600 

CAGGATTAGC AGCTTCTTCT GAACTACTAG AAGTAGATTC TGGCTCTGAA CTCTGCTCAG 6660 

GCTGAGAACT ACTTGAAGAT GAGATTTGTA CTACACTGGC ATCAGAATCA TGAAAGCCTT 6720 

TTAAGGCTGC TGTGCGATTA CTCCCCCCCG ATGATAGATA GATGAGAACG ATGACCATCA 6780 

CCACCACAAT TACAAAGAAA ATACTAGCTA GGATCGTCAA AATACGATTA GCCATCCTAT 6840 

CAGCCCCTCC GTGGTTTCGA TGCCGACGCT CTGCTCTTGA TTCTTCTTGA TCATAGATAT 6900 

CTTCTTGCCA CGGTTCTTTT GCCATACCTT ACTCCTTGTT TTTTTTTACT TTTCTTATTA 6960 

CAATATAAAT ATGAACATGA AAATCACACT TATACCTGAA CGATGTATCG CCTGTGGGCT 7020 

TTGCCAAACT TATTCTGATT TATTTGATTA CCACGATAAT GGAATCGTGC GTTTTTACGA 7080 

TGACCCTGAC CAACTGGAAA AAGAAATTTC TCCTAGTCAG GATATCTTAG AGGCTGTTAA 7140 

AAATTGCCCA ACTCGCGCCC TGATTGGAAA CCAGGAAGCC TAAATCAATG GCGATAATCC 7200 

ACTCCCTCTA GTTTAGCACA TTTCCATGTA AAATTATAGT CTTTTCACTT TATTTTTTTC 7260 

TGTAAAATCA GGAAGGTCAC TTTTTTCTTT GATAAGATAA AGTGGTCTTT TTTTAGTCTC 7320 

TAAATAAATC TTACTGATAT ACTTGCCGAG AATCCCAATG GTCAAGAGTT GAATGCCTCC 7380 

AAGAAAGAGA ATAACAGCCA TCAGAGAGGT CCAACCAGAT GTCGGATTGC CCAAAATGAG 7440 

GGTCCGAACC ACAACAAAAA AGGTCATCAG CAGAGAAAGA AAACAAGATA GGAGACCAGC 7500 

TACAAAGGCT ATAATCAAGG GAAAATCTGA AAAATTAATA ATCCCTTCAA TGGAGTAGAA 7560 

AAAGAGTTGC CTAAAACTCC AACTTGTCTT GCCAGCCTGC CTTTCGACAT TTGGATAGTC 7620 

CAAATAGTAG GTTTTGAAAC CCACCCAGGC GAAGAGCCCC TTTGAAAAAC . GATTGG ACTC 7680 

GGTCAAGCTT AAAATGGCAT CGACTACAGA CCTTCTCATC ATACGAAAAT CACGGACACC 7740 

CGACGGCAGA GCTACTGGGC TGATTTTTTG CATGAGGCGA TAAAAGAGAA CAGCACAGAA 7800 

ACTGCGAAAG AAGGGTTCTC CCTCCCGACT AGTTCTCCGT GTCCCAACGC AGTCCAAGTC 7860 

TACATTTTTG TCTAATACAT TTTTCATCTC AAACAAGATA CTAGGAGGAT CTTGGAGGTC 7920 

TGCATCCATC ACCACCACCA AATCTCCTGT CGCATATTGC AAGCCTGCAT AAAGGGCTGC 7980 

TTCTTTGCCA AAATTTCGAG AGAAAGAAAT ATAATGGACT GCCGGATTTT GCTCCCGATA 8040 
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GGCCTTTAAG AGTTCCAAGG TCCCATCACT TGATCCATCA TCGACAAAGA CATACTCGAT 8100 

TTCTGTTTCC AAATCTGGAA GTAAAGCTTC CAGAGCCTGA TAAAAAAGAG GAAGTACTTC 8160 

CTCTTCGTTT AAACAAGGGA CGATGATTGA AATCATCATC TTAGTCTTCA AATCCATTTG 8220 

GATGCTTGCT TTGCCAACGC CATGCGTCTT CACACATTTG GGTGATGTCG AGTTCTGCTT 8280 

CCCAACCGAG TTCTGCTTTA GCTTTTGCCG GGTCTGAGTA GCAGGCAGCG ATATCACCTG 8340 

GGCGACGTTC TACGATGCGG TAAGGAATAG GACGGCCCAC CGCTTTTTCC ATGTTTTGGA 8400 

TAATTTCAAG AACTGAGTAA CCTTTACCAG TTCCAAGGTT ATAAACGTTT AGTCCTGAAC 8460 

CTTTTTGGAT TTTTTTCAAA GCTGCAACGT GACCCTTAGC CAAATCGACA ACGTGGATAT 8520 

AGTCACGAAC ACCTGTTCCA TCTTCCGTAT CGTAATCGTC TCCAAACACT TGCACTTGCT 8580 

CTAATTTTCC AACGGCTACT TGAGTCACAT ATGGCAAGAG ATTGTTTGGA ATACCGTTTG 8640 

GATTTTCTCC CAAATCACCA CTCTCATGGG CTCCGATTGG GTTAAAGTAA CGAAGCAAGA 8700 

CAACATTCCA TTCTGAGTCT GCTTTGTAAA TATCAGTCAA AATTTCCTCT AGCATGAGCT 8760 

TAGTACGACC GTATGGGTTG GTCACTGAAA GTGGGAAATC TTCCAAGATG GGCACTGTGT 8820 

GCGGATCCCC GTAAACTGTC GCAGAAGAAC TGAAGATGAT GTTTTTACAG TTGTTTTCTT 8880 

CCATGGCTTT' CAAAAGGCTG ACAGTTCCAG CGATATTGTT GTCATAGTAG GCAAGAGGGA 8940 

TACGTGTTGA TTCGCCAACA GCCTTCAAAC CAGCAAAGTG AATGACACCA GTCGGTTCTT 9000 

CCTGCTTGAA AATATCTCTG AGGGTATCTG TGTCACGAAT ATCTGCCTCA TAGAAAGGAA 9060 

TCTCAACTCC TGTGATTCCT TCAACAACTT CTAAACTCTT ACGATTGCTA TTGACAAGAT 9120 

TATCCACCAC AACAACTTGA TGACCTGCTT GGATCAATTC AATAACAGTG TGGGTTCCAA 9180 

TAAAACCGGC ACCACCAGTT ACCAAAATCT TTTCTTGCAT CTTTTTTCCT CGATTCTCAG 9240 

ATTATTTTTT CTTATTTTAC CATTTTTGAC AGGGAATGTC ATTTGCCATC CTAAACTACC 9300 

TGATAAAATT TCAGTAAAAT GCTTATACTC TTCGAAAATC CAATTCAAAC TACGTCAACG 9360 

TCGCCTTGCC ATGGGTATGG TTACTGACTT CGTCAGTTCT ATCCACAACC TCAAAACAGT 9420 

GTTTTGAGCT GACTTCGTCA GTTCTATCCA CAACCTCAAA GCAGTGCTTT GAGTAACCCG 9480 

CGGCTAGTTT CCTAGTTTGT TCTTTGATTT TTATTGAGTA TTATTCGCTT TTTACTCGTT 9540 

TGACATAGTT TTCAATTGGG TAATTTAGAG GGTCCAAGGT CAACTCCTTG TCTTGGATCA 9600 

GTTGGGCTAG ATGGTAACCA ATGATAGGAC CAGTTGTGAG GCCTGATGAA CCTAGTCCAC 9660 

TGGCTGCATA GACACCAGTT AAGTCAGGCA CCTGCCCAAA GAAAGGAGAG AAATCACTGG 9720 

TGTAGGCACG GATTCCAACA CGCTCAGATT TTGAAGTAGC TTCAGCCAAA ATCAGATAGT 9780 
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GAGTCAAGGT GGCCTCCTCC ATTTGTTGGA GCAAGGTTTC ATCTACCGTC AAATCAAATC 9840 

CCATGTCATT TTCGTGGGTA GCGCCTAAGG ATAATTTCCC ACCTGCAAAG GGAATCAAAT 9900 

CCCACTCCCC TTCTGGCATG ACAACAGGGT AATCTTCCAT GTCTTGGGCA AGCTGATAAT 9960 

CTCGTAGTTG TCCTTTTTGA GGACGGACAT CCACTTCATA ACCTAAAGGC TCTAACATGT 10020 

CCCCCAACCA AGCTCCCGTC GCCAAAATAA CCTGCTCAAA CTCCTCTTCA CCAATCTGGT 10080 

AGCCTGATGC TAACGGTGTC AGAGTCACTT TTTCTTTGAC CAGCTTGACA TGACTGACTT 10140 

CCAGCAAACG AGTCACTAAA AGTTGGCCAT CTACTCTCGC TCCACCAGAA GCATAGAGCA 10200 

GGCGGTCAAA TCCCTGCAAA CCAGGGAATA ATTCATTAGC TGAGGCTTGG TTCAGAATGG 10260 

CTAATTGCCC TATCAAGGGA GATTCTTCTC TGCGCTGGAG GGCCAGTTGA TAAAGTTCTT 10320 

CCAAATTGGA TTCATCCTTT TTCAAGAGAA AGACTCCCGA ACGCTGGTAA AAGTCGATTT 10380 

CTTGTCCTGA TTTCTCTAAA TCAGCTAATA AATCCACATA AAAATCAGCC CCCAAGCGCG 10440 

CCATCTTGTA CCAGGCTTTA TTACGGCGTT TGGAAAACCA AGGACTGATA ATTCCTGCTG 10500 

CGGCCTTGGT GGCTTGACCT TGCTCATGGT CAAAAACGGT CACCTCTAGG TCACTTTCTC 10560 

TCGAGAGGTA GTAGGCAGCT GTTGCTCCCA CAATTCCTGC TCCAATAATG GCAACTTTTT 10620 

TCATTGTCTT CACTTTCTAA CTAGATATGA TGGAAAGGAT TGGTTGATGC CTGACTAGGC 10680 

AAGATATCAA TAGACCACCC CTTATCTTCC TTCCATTGAC TAAGAAGTGC TGCGATTTTT 10740 

TCTACAAAAA TCACTTCGAT ATAGTGACCT GGGTCCAATG CAAGCAACCC ATCAGATAGC 10800 

ATATCCTGAG CAGTATGGTA GTAGATATCA CCAGTGATAT AGACATCTGC CCCCTTTGCC 10860 

AAAGCATCCT TATAGAAAGA CTGCCCGCTT CCACCACAAA TTGCTACTCT TGAAATAGGC 10 920 

TTCTGCAAAT CATCCTCTTG ATAATGCACC ATTCGAAGGC TATCTAGGTC AAAGACTTGC 10980 

TTGACCTGTT GGGCCAATTC CCAAAATGTC TGAGGCTGAA TATTCCCAAT ACGTCCAATT 11040 

CCACGTTCTG GACCTGTTTC CTGCAGATAA GTCGTCTCCT CGATTCCTAG CATCTGACAA 11100 

AACCAGTCAT TGAGCCCATT TTCAACGATA TCAATATTGG TATGGCTGAC ATAAACTGCG 11160 

ATATCATGCT TAATCAGGTC GATGTAAATC TGATTTTGCG GACGGCTGGC AAGCAAGTCC 11220 

TTGATAGGAC GAAAGATAGG CGCGTGCTTG ACGATAATCA AGTCCACACC CTTTTCAATG 11280 

GCCTCTGCCA CTGTCTCTTC ACGAATATCG AGGGCAACCA TGACCCTTTG GATACCCTTG 11340 

TCTAAAGTGC CAATTTGCAG ACCACGGCTG TCTCCCTCCA TAGAAAATTC CTGAGGGCAA 11400 

AAGGCTTCAT AAGCTTGGAT CACTTCACTT GCTAACATGG AGCACCTCCT TGATAGCTTG 11460 

AATCTTATCT ACTAGAACTT GACGTTCTTC CAGATTTTTT TCTGGGATTT GTCCGAGGGC 11520 

GAACTCTAGC TTCTCAGCTT CTTTTTGCCA TTTTTGGACA AATACTGGAC TGACTTCTTT 11580 
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GGACAAGAAG GGACCAAAGC GAACATCACT GGCTGATAGC TTCATTTGTC CTGCTTCCAC 11640 

CACCAAAATC TCATAAAACT TTCCAGCTTC TTCTAAGATG CTTTCTGCTA CAATCTGGAA 11700 

TCCATGATCC TGTAGCCAGA TACGCAAGTC GTCTTCACGA TTATTGGGCT GGAGGATCAA 11760 

ACGCTCTACA TTAGCTAACT TCCCCAAACC TTCTTCTAAA ATCCTAGCAA TCAAACGACC 11820 

ACCCATGCCA GCAATGGTAA TGACAGACAC TTGGTCAGTC TCTTCAAAAG CTGCCAAGCC 11880 
ATTGGCTAAA CGGACTTGGA TTTTCTCCTT TAGGCCGTGA GCCTCAACAT TTTTAACCGC . 11940 

AGACTGATAG GGACCTTCCA CCACCTCACC TGCAATAGCG CTTTTGATTT GGCCTCTCTC 12000 

AACCAACTCG ATAGGCAGAT AAGCATGGTC ACTTCCCACA TCTAGTAAAA TAGCCCCCTG 12060 

TGACACAAAG GAAGCTACCA ATTCTAATCT CTTTGAAATC ATCTTCTCTC ACTTTCCAAA 12120 

ACTCTATTAC CTCTTATTAT ACCACATTTC AATCTTCAAC TTCCCAGTAA TATAAGCACC 12180 

TCTGGCGAAA GAAGTTTCAA TGTCCTAAAG TAATAAGTGA ATCCAATTGA AAGATTTTAA 12240 

ACAATTTGCA AAAATGTCAA AAAATAAAAA ATAAACAGTT TATTCAGAAA ATTCTTGACA 12300 

TATAAAAACA CATGGTAGAA TATAATTAGA AAGTTAGAAA AAATAAAAGT TTGACTAAAA 12360 

TTTGTATTTG AAGGTGGTGT TCAGATAAGA AATTTAGTCA GACGAACCAC GAATTTGCTC 12420 

TATGCTTTCT GGAATTTATC ATAACAGGAG GATACAGTCA TGGAACAAAC ATTGTTTGAA 12480 

TTAGAACTAC TTCCAGAGGA AGATATCATT GTCACAGGTC TCCCTAAGTA TTGTTCTTTT 12540 

ACTTGTTTAA TTACAGGTCG CTAGTTATAT TTTATATAAA ATAAGTAGCT TTACTTACGG 12600 

AATAGGCTAG TGCTGTGTCT CTAGCCTATT TTAATAATTA GGAGTTTGTT ATGGATTTAT 12660 

TAGAGAAAGA ATGTTTAAAA TGTGATAAAA ATTTCCAACA GGGTGATATT TGGAATTACT 12720 

ATTATTTATC AGATAAGATG CCTGCACAAG GGTGGAAAAT ACACATAAGC TCCCAAATAA 12780 

AAGACGCTGT AAATATTTTT AAGATTGTGT ATAAACTATC CCAACTAAAT AATTGTAGCT 12840 

TTAAAGTTGT TAAAAATTTA GAGGAATTAA AAAAAATTAA TTCCCCTAGG GAAATGAGCC 12900 

CTACTGCTAA CAAATTTATA ACTCTATATC CTAAGTCAGA ATCTGAAGCT AAGAGTATGA 12960 

TTTGTAATCT TACGAATAGA CTGTCAGAAT TTAAGGCTCC AAAAATACTA TCTGACTATC 13020 

AATGTGGAAT GCATTCTCCA GTTCATTATA GATATGGGGC TTTTTTAAAA AAACAAGCTT 13080 

ATGATGAAAA AAATAAAAAA GTCATCTATT TATTGCTAGA TGAAAAAAGG AAGAACTATG 13140 

TAGAAGATAA GAGACAAAAT TTCCCTAGTC TTCCTAGCTG GAAAATGGAT TTATTTTCAG 13200 

AAGAAG 13206 
(2) INFORMATION FOR SEQ ID NO: 34: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 13104 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

CCGGATCCAG CGAAAAATAT GCTCTTTGAT GCTGTAAGTG GTCAAAAAGA TGCTAAAACA 60 

GCTGCTAACG ATGCTGTAAC ATTGATCAAA GAAACAATCA AACAAAAATT TGGTGAATAA 120 

AAAATTTGTT CAAGGGGGGT GGAAATCAAA TCCCCCTTTG AATTTATCAA TAGAGACACA 180 

AATAATTTAG CTTTCTTATA AAAAAGTAGT ATCCTATGAA AGGAGTTAAT ATGGAAAAGC 240 

AACAACCTAG TAAAGCAGCC CTGCTGTCTA TCATTCCTGG GTTAGGACAG ATTTACAATA 300 

AACAAAAAGC CAAAGGTTTT ATCTTCCTTG GTGTAACCAT CGTATTTGTC CTTTACTTCC 360 

TAGCACTTGC AACCCCTGAA TTGAGCAACC TCATCACTCT TGGTGACAAA CCAGGTCGTG 420 

ATAATTCCCT CTTTATGCTG ATTCGTGGTG CCTTCCATCT AATCTTTGTA ATCGTTTATG 480 

TACTCTTTTA TTTCTCAAAT ATCAAAGATG CACATACGAT TGCAAAACGC ATTAACAATG 540 

GAATTCCAGT TCCACGCACA CTCAAAGACA TGATCAAAGG GATTTATGAA AATGGCTTCC 600 

CTTACCTCTT GATCATTCCA TCTTATGTTG CCATGACCTT CGCGATTATC TTCCCAGTTA 660 

TCGTAACCTT GATGATCGCC TTTACCAACT ACGACTTCCA ACACTTGCCA CCAAACAAGT 720 

TGTTGGACTG GGTTGGTTTG ACCAACTTTA CAAACATTTG GAGCTTGAGT ACCTTCCGTT 780 

CTGCCTTTGG TTCTGTTCTT TCTTGGACTA TCATTTGGGC TTTGGCAGCT TCTACTTTAC 840 

AAATCGTAAT TGGTATCTTC ACAGCTATCA TTGCCAACCA ACCATTTATC AAAGGAAAAC 900 

GTATCTTTGG TGTTATTTTC CTTCTTCCTT GGGCTGTCCC AGCCTTCATC ACTATCTTGA 960 

CATTCTCAAA CATGTTTAAC GATAGTGTCG GTGCTATCAA CACTCAAGTA TTGCCAATCT 1020 

TGGCTAAATT CCTTCCTTTC CTTGATGGAG CTCTTATTCC TTGGAAAACA GACCCAACTT 1080 

GGACTAAGAT TGCCTTGATT ATGATGCAAG GTTGGCTCGG ATTCCCATAC ATCTACGTTC 1140 

TGACCTTGGG TATCTTGCAA TCTATTCCTA ACGACCTTTA CGAAGCAGCT TATATTGACG 1200 

GTGCCAACGC TTGGCAAAAA TTCCGCAACA TCACTTTCCC AATGATTTTG GCTGTTGCGG 1260 

CACCTACTTT GATTAGCCAA TACACCTTCA ACTTTAACAA CTTCTCTATC ATGTACCTCT 1320 

TCAATGGTGG AGGACCTGGT AGTGTCGGAG GTGGAGCTGG TTCAACCGAT ATCTTGATCT 1380 

CATGGATCTA CCGTTTGACA ACAGGTACAT CTCCTCAATA CTCAATGGCG GCAGCTGTTA 1440 

CCTTGATTAT CTCTATCATT GTCATCTCAA TCTCTATGAT CGCATTCAAG AAACTACACG 1500 
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CATTTGATAT GGAGGACGTC TAAGATGAAT AACTCAATTA AACTCAAACG TAGACTGACT 1560 

CAAAGCCTTA CTTACCTTTA CCTGATTGGT CTATCAATTG TAATTATCTA TCCACTGTTG 1620 

ATTACCATTA TGTCAGCCTT TAAAGCAGGT AACGTCTCAG CCTTTAAACT AGATACTAAT 1680 

ATCGACCTCA ATTTTGATAA CTTTAAAGGC CTCTTCACTG AAACCTTGTA CGGTACTTGG 1740 

TACCTCAACA CTTTGATTAT CGCCTTAATT ACCATGGCTG TTCAAACAAG TATCATCGTA 1800 

CTTGCTGGTT ATGCTTACAG CCGTTACAAC TTCTTGGCTC GTAAACAAAG TTTGGTCTTC 1860 

TTCTTGATCA TCCAAATGGT GCCAACTATG GCCGCTTTGA CAGCCTTCTT CGTTATGGCG 1920 

CTTATGTTGA ACGCCCTTAA CCACAACTGG TTCCTCATCT TCCTCTACGT TGGTGGTGGT 1980 

ATCCCGATGA ATGCTTGGCT CATGAAAGGC TACTTCGATA CAGTGCCAAT GTCTTTAGAC 2040 

GAATCTGCAA AACTAGACGG TGCAGGACAC TTCCGCCGCT TCTGGCAAAT TGTTCTACCA 2100 

CTTGTTCGCC CAATGGTTGC CGTACAAGCT CTCTGGGCCT TCATGGGACC TTTCGGGGAC 2160 

TACATCCTCT CTAGTTTCTT GCTTCGTGAG AAAGAATACT TTACTGTTGC CGTAGGTCTC 2220 

CAAACCTTCG TTAACAATGC GAAAAACTTG AAGATTGCCT ACTTCTCAGC AGGTGCTATC 2280 

CTCATCGCCC TTCCAATCTG TATTCTCTTC TTCTTCCTAC AAAAGAACTT TGTTTCAGGA 2340 

CTTACAAGTG GTGGCGACAA GGGATAATTT ATCCCCGCCA CCCTTTTTCA TTTTATACTC 2400 

TTCGAAAATC TCTTCAAACC ACGTCAGCTT TATCTCCAAC CTCAAAGTTG TGCTTTGAGC 2460. 

AACCTGTGGC TAGTTTGCAC TTTGATTTTC ATTGATTATT AGCAATTGTC ACTGTAAATA 2520 

ATATCCTTGT AGCAAGCAAT TTTTCTCCTA GACTTGAAAT AAAGCGCATT TCTCTATATA 2580 

ATAATACTCA TATAGAAAAC ACCTTTTAGA AAGATACCTA TGCTTCCATA TCCATTTTCC 2640 

TATTTTTCAA GTATTTGGGG GGTTCGTAAG CCCCTGTCCA AACGTTTCGA GCTCAACTGG 2700 

TTTCAACTTC TCTTTACCAG TATCTTCCTT ATCAGCTTGT CTATGGTACC CATTGCTATC 2760 

CAAAACAGCT CCCAGGAGAC CTATCCGCTA GAAACTTTTA TCGATAATGT CTATGAACCT 282 0 

CTGACAGATA AGGTTGTCCA GGATCTCTCT GAACATGCTA CAATTGTCGA TGGCACATTA 2880 

ACTTATACTG GAACAGCTAG TCAAGCCCCT TCTGTTGTGA TTGGTCCAAG TCAAATCAAG 2940 

GAATTACCTA AGGACTTGCA ACTGCATTTC GATACAAATG AGCTAGTCAT CAGCAAGGAA 3000 

AGCAAGGAAC TGACCCGCAT CTCTTACCGA GCCATTCAGA CTGAGAGTTT CAAAAGCAAA 3060 

GACAGCTTGA CCCAAGCAAT TTCTAAAGAC TGGTACCAAC AAAATCGTGT CTATATCAGC 3120 

CTCTTCCTAG TTCTCGGTGC GAGCTTCCTC TTTGGTTTGA ATTTCTTTAT CGTCTCTCTT 3180 

GGAGCTAGCT TTCTCCTTTA TATCACCAAA AGATCACGCC TCTTTTCATT TAATACCTTT 3240 
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AAAGAGTGCT ACCATTTTAT CTTGAACTGT TTAGGATTGC CGACTCTGAT TACACTTATT 3300 

TTGGGATTAT TTGGCCAAAA TATGACAACC CTGATTACTG TACAAAATAT TCTTTTTGTT 3360 

CTGTATCTGG TCACTATCTT TTATAAAACA CATTTCCGTG ATCCAAATTA CCATAAATAG 3420 

GAGATTTTTA TGCCCGTTAC GATTAAAGAC GTGGCCAAGG CTGCTGGTGT TTCGCCTTCA 3480 

ACCGTAACCC GTGTTATTCA AAATAAATCA ACCATTAGCG ACGAAACAAA AAAACGTGTT 3540 

CGCAAAGCTA TGAAGGAACT CAACTACCAC CCAAACCTCA ACGCTCGTAG CTTGGTAAGC 3600 

AGCTATACTC AGGTTATCGG ATTAGTTCTT CCTGATGACT CAGACGCCTT CTACCAGAAT 3660 

CCTTTCTTTC CATCGGTTCT ACGTGGCATC TCTCAAGTCG CATCTGAAAA CCACTATGCC 3720 

ATTCAGATAG CAACAGGGAA AGATGAGAAG GAGCGTCTCA ACGCTATTTC ACAAATCGTC 3780 

TACGGCAAGC GTGTAGATGG GCTAATTTTT CTCTATGCCC AAGAAGAAGA CCCTCTCGTA 3840 

AAACTCGTCG CAGAAGAACA GTTCCCCTTC CTTATCTTAG GTAAATCTCT ATCTCCTTTC 3900 

ATCCCACTTG TCGACAACGA CAATGTTCAA GCTGGTTTTG ATGCGACTGA ATATTTCATC 3960 

AAAAAAGGCT GCAAACGCAT TGCCTTTATC GGAGGAAGTA AAAAGCTCTT GGTGACCAAA 4020 

GACCGTTTAA CAGGCTATGA ACAGGCGCTT AAACATTACA AACTTACCAC TGACAACAAT 4080 

CGCATCTACT TTGCCGACGA GTTTCTGGAA GAAAAGGGCT ATAAATTTAG CAAGCGATTA 4140 

TTCAAGCACG ATCCACAAAT TGATGCTATC ATCACAACCG ATAGCCTCCT AGCTGAAGGT 4200 

GTTTGTAACT ATATTGCCAA ACACCAGCTG GATGTCCCTG TTCTCAGCTT TGACTCGGTT 4260 

AATCCCAAGC TCAACTTGGC AGCCTATGTC GATATCAATA GTTTAGAGCT TGGTCGTGTT 4320 

TCCCTTGAAA CTATTCTCCA GATTATTAAT GATAATAAAA ACAATAAACA AATTTGTTAC 4380 

CGTCAATTGA TCGCCCACAA AATTATCGAA AAATAAGAGA CTGGGCAAAA AGTCGTTAAA 4440 

AGCAAAAACG CATACTATCA GGTATTGAAA AAACTTGATA CTATGCGTTT TATTGTGGGA 4500 

AGATTTACTT CCTTTTCTAC TGAAATTGAG TCTTTTCCCA AGATCTTTTT ATACTCAATG 4560 

AAAATCAAAG TGCAAACTAG GAAGCTAGCC GCAGGTTGCT CAAAACACTG TTTTGAGGTT 4620 

GTAGATGAAA CTGACGAAGT CAGTAACCAT ACCTACGGCA AGGTGAAGCT GACGTGGTTT 4680 

GAAGAGATTT TCGAAGAGTA TTAATCACTA ATTATCTATC TCAACAAATC TTCCTAGAAT 4740 

ATGAACATTT TCCGAGACAG AGACAAAGGA GCTTGGATCC ACTTGTGTCA TAATCTGTTT 4800 

AAATTCATTA AACTCTGCAC GTGTAATGAC AGTGATTAAA ACTGCCTTTC TCTCGTGATT 4860 

ATAGGTTCCT TCTGCATCGT GGATCATGGT TGCTCCGCGG TGCAATTTTT TATGGATTTT 4920 

TTCAATTACC TTCTCTGGAT GATTTGTCAC AATCATGGCC TGCATAOGCT TTTGCTTAGT 4980 

AAAGACTGCG TCTGTCACAC GGCTAGAGAC AAAGATGGTA ATCATAGAAT AAAGAGCGTA 5040 
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TTTCCAACCA AAGGTCAAAC CTGCTATCAG CATGATAGTT CCATTTACCA AGAAAGAAAT 5100 

ACTACCGACA TTCTTACCCG TTTTCTTACG AATAGTCAGG CTGACGATAT CCGTCCCACC 5160 

ACTGGAGATA TTGTTTCGAA GAGCAAAACC AATCCCCAAA CCCATAACAA CACCCCCAAA 5220 

AAGGGAATTG ATAATGGGAT CCTCTGTCAA GGTTGCCACA GGGACAAACT GGATAAAGAA 5280 

GGAACTCATA GATACCGTGA TAAAGGTAAA GACGGTGAAC TTATGGCCAA TCTGATACCA 5340 

AGCTAAGACC ATCAAAGGGA AGTTAATGGC GTAGAAGCTT AGCGAAATCG GAATATGAAA 5400 

ACCAAACCAG TGATTACTCA AGGCAGAGAT AATCTGTGCC AGACCTGTTG CACCACTCGA 54 60 

ATACACATGC CCTGGTTGGA AAAAGAAATT AACTGCTACT GCTGATAAAA AACCATAGAC 5520 

CAGAGAGGCC GAAATCTTCT CATCATACTT TTCTCGAGAG ATACTTTGTA AGACACGTAA 5580 

AATTTTTATC TGATAAGCAA AGCGGCGCAG ATAATAGCGC CACCGCTTAA TTCGTTTTGT 5640 

TTGTTTCATC TTCTTCTACT TGTAAGCTGA GTTCCTCTAG TTGTTTGAGA GCGACTGTTG 5700 

ATGGAGCTTG TGTCATTGGG TCAGTTGCCT TGTTGTTCTT AGGAAAGGCA ATGACTTCAC 5760 

GGATATTTTC TTCTCCAGCA AGCAACATGA CAAAACGGTC AAGCCCGATA GCCAAACCAC 5820 

CGTGTGGTGG GAAACCATAG TCCATGGCTT CAAGAAGGAA ACCAAACTGG TCATTGGCTT 5880 

CTTCAGTTGA GAAACCAAGA GCCTTGAACA TGCGTTCTTG AAGGTCTTTT TGGTTGATAC 5940 

GAAGGCTACC ACCACCAAGC TCATAACCGT TCAAGACGAT ATCGTAAGCA ATGGCACGAA 6000 

CCTTAGCCAA ATCACCTTCT AATTCATGAG CAGTCTCTTC CTGTGGAAGT GTGAAAGGAT 6060 

GGTGGGCGCT CATGTAGCGG CCTTCTTCTT CAGACCATTC AAACATCGGC CAGTCAACCA 6120 

CCCAAAGGAA GTTGAACTTA TCATTATCAA TCAAGCCAAG CTCTTTAGCA ATACGTCCAC 6180 

GAAGGGCACC CAGTGTTGCA TTAGCCACTT CAAGCGTATC CGCCACAAAG AGAACCAAGT 6240 

CCTTATCTTC AAGAACAAGC GCTGTTGTCA ATTCTTCTTG GATACCAGTC AAGAACTTGG 6300 

CAACTGGTCC GTTTAATTCT CCATCAACCA CCTTGACCCA AGCAAGACCT TTGGCACCAT 6360 

ACTGTTTGGC TACTTCCGTC ATCTTGTCGA TGTCTTTACG TGAATAGTTG TCCGCAGCTC 6420 

CTGTGACCAC AATCGCTTTT ACAGCAGGTG CTTCTGAAAA GACTTTAAAG TCTACACCTC 6480 

GGACCACTTC TGTCAAGTCC TGAAGCAACA TGTCAAAACG AGTATCTGGC TTGTCAGAAC 6540 

CGTAAAGAGC CATAGCATCA TCGTATTTCA TACGAGGGAA TGGTAGCGTT ACTTCGATGC 6600 

CTTTTGTTTC CTTCATCACG CGCGCGATCA AGCTTTCTGT AATATCTTGG ATTTCTTGCT 6660 

CAGTAAGGAA GGACGTTTCC AAGTCGACCT GAGTAAATTC AGGCTGGCGG TCTCCACGCA 6720 

AGTCCTCGTC ACGGAAACAT TTAACGATTT GGTAGTAACG GTCAAAACCA GCATTCATCA 6780 
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AGAGCTGTTT CGTGATTTGT GGACTTTGAG GAAGAGCGTA AAAATGCCCC TTATTAACAC 6840 

GAGACGGCAC TAAATAATCA CGCGCCCCTT CAGGCGTTGA CTTAGAAAGG AATGGTGTCT 6900 

CCACGTCGAT AAACTCCAAC TCATCCAAGT AGTTGCGGAT AGAGTGGGTC ACCTTGGCAC 6960 

GAAGTTTAAG ATTTTCCAAC ATTTCTGGAC GACGAAGGTC AAGGTAACGG TAACGCAAAC 7020 

GTGTATCGTC ATTTGCCTCA ATGCCATCCT TAATCTCAAA TGGTGTTGTC TTAGCTGTGT 7080 

TAAGCACAAT AAGAGCTGTC ACGTTTAACT CAACCGCACC AGTTGGCAAC TTATCATTGG 7140 

CTTGTCACGC GCAGCGACCT GACCAGTCAC CTCAATAACA AATTCGCTAC GAAGGcTTTC 7200 

AGCTGTTGCC ATAACCTCTG CAGATACTTT TTCAGGGTTG ATAACCAACT GCATGATTCC 7260 

TTCACGGTCA CGAAGATCGA TAAAGATCAA ACCACCAAGG TCACGACGAC GGCCAACCCA 7320 

TCCTTTCAAG GTTATTTCTT GTCCGATGTG TTCCTCACGA ACACGACCAG CATACATACT 7380 

ACGTTTCATT ATTTCTCTCC TCTTTTATTC TGTTACTATT TTACCATAAA AGCGCAGCTC 7440 

TTCATGAAAA TCATCAGAAA AGTTTGCCAG TCTTTAAAAG TCAGGTGAAA GCCCTAAAAA 7500 

TTAGCGCTAA TACTCTTCGA AAATCTCTTC AAACCACGTC AGCGTCGCCT TACCGTATGT 7560 

ATGGTTACTG ACTTCGTCAG TTTCATCTAC AACCTCAAAA CCATGTTTTG AGCTGACTTC 7620 

GTCAGTTCTA TCCACAACCT CAAAACAGTG TTTTGAGCAA CCTGCGGCTA GCTTCCTAGT 7680 

TTGCTCTTTG ATTTTCATTG AGTATAATAC AAAAATCCGA TGAACTTCAC CGGACTCTTT 7740 

TATTTTGAAT TTTTGCCTGC TTTACGCTTT TCAGCGATTT CGGCTGCCTT TCGAGGCAAG 7800 

ACAATTTCCG TTATGTAAGC CGTCCCAAAA CGCAGTACAC CTGCAATAGG AGCAAAGACA 7860 

ACTGCTAGAT AGTTATAGAA GAAATCGCCT TTGAAGGCAT AAGCTAGCGC TCCAATGATG 7920 

AAAAATAGAA CGACTGCCTG AATCACTGCT AATAAAATTA CTCGTTTCAT GTGACCTCCT 7980 

GACTCTATTA TAGCATGAGA ATCATCAAAA AGCCGACTAA ATTATTCAAA GCGTGAAGAG 8040 

AAATACTGTA GACCAGACCT TTTCTGCTAA TGTAAGCCAA ACCCAAACTA AAACCAAGGC 8100 

TAAAATAGAC AAAAAATTGT TGCACATCAC CTGGAAAATG AATCAAGGCA AATAGAAGAC 8160 

TAGATACCAG AAGAAAAATC AGGGTTCGTT TACTATTGTC CTGCTTAGGA AAGAGATAGC 3220 

GTGCTAACAT CCCTCTAAAA ACAATCTCTT CCGTCAAAGG AGCAAAAATA ACCACAGCAA 8280 

AGAATGAGAA AAGTGGTTGA GACAAGGTCA AGTCTGTCGC TATTTGCTGA TTTACTGAAG 8340 

GATCATCTGG CAAGAAGAAT TGAACGACCA GAGATAAGAA CCAAACCAAG ACAGGAAGCC 8400 

AAATAAATCG ATTAAAGCCG CTCTTCTCAA TATGAACAGG AGCCTTCTGA TACCATTTGT 8460 

AAATGCCGTA CACATATACT CCAGCCAAGG CCACATAGAG TAGAGTAACA GCATAGGGTG 8520 

AAGCGCCTAA AGCAAGCGAC GCAGTCGCGA GCCCCTGAAT AAAGCCATAG ATAAATAAAA 8580 
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AGGATAGAAG GGCTAGAAGA ATCCAGCCAA GGTTTTTAAG TAATTTCATA GATAACTCCT 8640 

TTATTTGAAA TAACGTTTTA CCATAGGTAA CTGCATCACA TTGATATAAA CATGGATGGC 8700 

TCCTACAAGC AAGAAAGCTA GTAACTGAAT CTCTCCTGTC AAGAAAGAAA TGATAATAAG 8760 

AAAAATATAT AAGGCTGGTA AGACATATTG GTGTAATTGG AATAAAATTC GAAAACTCTG 8820 

TTCCAAATTA GCCTGACGCT CCCCTTCATC ATAAGAATTT ATATAGTTCA AGACATCCTT 8880 

TGGTGTAGCG AAAAATTCCA AATCAAACTG ACGAACAATC GCAATGGTTT TAAAAAGAGA 8940 

TTTTTGAGCG ACTAAGAATA CCACAAAGAG TAAGAAAGAA AGGAAAAATG TTTGAGGGTT 9000 

TGTATGCAAT ATAATCACCT CACTTAATGA AATAAAAATA GCCAATGGAA TCGCTACACC 9060 

TGTAATATTA AAAGCAATGG TTCCAAACTC AAGATTCCGA TACATTTGCA CATAATAGGT 9120 

TTCATTCAGA TCGTCATCCA TTTCCTCTTG ATACAAAGAA TGAAATTTTC TGCTTTTCTT 9180 

TAAGAAATTG AAAGTCAAAA ACATACTAAT GAAACCTATC AGTAAACAAA TAGCTGATAT 9240 

CCATGGCATC AAGGCTTTTA CATCTAAAAT AATTTCGTGG GATTCGACAC GTGCCTTAAA 9300 

CATCCCTACA AACATGCCCA AGAACCCCCC AAGACAATAG ACATCAAAAA TAACAATCTA 9360 

CGTTTCTTTT TCATATTCAT TCTCCTTTTT CACTTGCTAG ATTTTTGGAT TTCTTTTCAA 9420 

TCCATTCAAT TACTGGGATG AGAGCAAAGT AGACCCAAAC AAATTGGTCG CTTTGATAGG 9480 

GATTAAACCA GCTTAGGTCC ATCCCAATCA GTAGAAATAC GCTGACTAAT AAAGCTATGA 9540 

CCACTACATA ATAAATCACT TTATACTTGT TCATCACTCG TCCTCCTCCA AACGAAATAC 9600 

CGATTCGACT GTTTCGTTGA AAATTTGAGA TATTTTCAGG GCAATGATAA TGGATGGGGT 9660 

GTACTCATCC CGTTCTAGTA GGCTAATGGT CTGTCTGGAA ACCCCTGCCA GTTTGGCTAG 9720 

GTCGGTTTGA TTGAGACCAT CGCGAGCTCG AAGCTCTTTT AGACGATTTT TTAGTTGCAT 9780 

GTTACACACC TACTCTCCGT CAAATTCAAC GGTTTGGATA TCCTCAATAC GTTGCAACTT 9840 

GAATTTTTCT TTTCCCGTAT TATCTACACG TCGTAGCTTT ACCCATTCCT CATCAACATC 9900 

CACAACTTCC CAGTTATCTG GCCCAATATA CACTCCCGTT AT AATTGGTT CCTTTCCAAT 9960 

CATTTCTTGT AATAATCTCG ACATTTCTGC GTTTCCTTTC TCTTTTCGCT CAAGTCTTTT 10020 

GATTTTATTC TCTAGTTTCT TGATTTTTTT AGAATTATTA GAATAAAAGA AAATCATAAA 10080 

TAGTATAAAT CCTAGTACCC ACATTATAAC TCCTTTCTGC TTCCTATTTC TTAACTTGAA 10140 

TTCATTGTAA CATATCTTTT TCTTTTTGAC AAGTATAGTT GTCAAAAAAA TTATGATTTT 10200 

TGTCATTTTG CAAAAGAAAA AGGTCAGGAG TAGGTTCCTG ACCACTTTAT CTATCATTAA 10260 

TACTCTTCTA AAATCTCTTC AAACCACGTC AGCTTCACCT TGCCGTAGGT ATGGTTACTG 10320 
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ACTTCGTCAG TTTCATCTAC AACCTCAAAA CCATGTTTTG AGCTGACTTC GTCAGTTCTA 10380 

TCCACAACCT CAAAACCATG TTTTGAGCTG ACTTCGTCAG TTCTATCCAC AACCTCAAAA 10440 

CCATGTTTTG AGCTGACTTC GTCAGTTCTA TCCACAACCT CAAAACAGTG TTTTGAGCAA 10500 

CCTGCGGCTA GCTTCCTAGT TTGCTCTTTG ATTTTTATTG AGTATAAAAT CCTAGTTTTT 10560 

CAAAGATTTC TGAGAAGTTT TGGCTGATTG TCTCAAGTGA CACTTGCACT TCTTCTCGGG 10620 

TTTGGTTGTT CTTGACCGTC ACTTGTCCGC TTTCGACTTC GCTCTCTCCT AGGGTGATGA 10680 

GGGTCTTAGC CGCAAAGACA TCGGCTGACT TGAACTGAGC TTTTAGTTTA CGGTTGAGGT 10740 

AATCACGCTC TGCTTTGAAA CCTTGTTGGC GAAGAGCCTG TACCAATTCC AAGGCCTTGA 10800 

TATTTGCCCC TTCGCCCAAG ACTGCGATAT AGACATCTAG GGCGTTTTCG ATAGGGAGGG 10860 

TCACACCTTG CTTTTCAAGG ATGAGAAGCA GGCGCTCTAC ACCAAGTCCA AAACCAAATC 10920 

CAGCAGTTTC AGGGCCTCCA AAGTAAGCAA CCAAACCATC GTAGCGACCA CCCGCACAGA 10980 

CGGTCAGGTC ATTGCCCTCA ATCTCTGTGA TAAACTCGAA AATGGTGTGG TTGTAGTAGT 11040 

CCAGACCACG CACCATATTG GTATCGATGA TGTAATCTAC TCCAAGATTT TCCAACATCT 11100 

GACGCACAGC ATCAAAATGA GCTTGGCTTT CTTCATCAAG AAAGTCCAAG ATAGACGGCG 11160 

CATTCTCTAC TGCCACCTTG TCTTCTTTTT CCTTAGAGTC CAAGACACGA AGAGGATTTT 11220 

CCTCCAAGCG ACGTTGGCTA TCCTTAGACA AGGTCTCCTT GAGCGGTGTC AAATAGTCAA 11280 

TCAAGGCTTG GCGGTAGGCT GCACGGCTCT CAGGATTTCC AAGAGTGTTG AGGTGCAATT 11340 

TGACACCTTG AATACCGATT TCCTTCAAAA AATGGGCTGC CATAGCGATT GTTTCCACAT 11400 

CGGTAGCTGG ATTGCTAGAG CCAAAACACT CAACACCAAT CTGGTGGAAT TGGCGCAAGC 11460 

GCCCTGCCTG TGGACGCTCA TAACGGAACA TAGGTCCCAT GTAGTAGAAC TTGCTTGGCT 11520 

TTTGCACTTC TGGGGCGAAA AGTTTATTTT CCACATAGGA ACGGACAACG GGTGCAGTTC 11580 

CTTCTGGACG GAGGGTAATA TGACGGTCAC CCTTGTCATA AAAATCGTAC ATTTCCTTGG 11640 

TTACGATATC CGTTGTATCT CCGACAGAGC GACTGATAAC CTCGTAATGC TCAAAAATAG 11700 

GCGTGCGCAC TTCTGCATAG TTGTAGCGTT TGAAAATCTC ACGGGCAAAG CCCTCAACGT 11760 

ACTGCCACTT AGCAGACTCA GCAGGTAAAA TATCCTGCGT TCCTTTTGGT TTTTGTAATT 11820 

TCATAGGGAA TCCTCTTTAA ACTTAATAGT CTTATTTTAC CATAAATAGA GGGATTAAAA 11880 

CAGTAAGAAA AAAATTAGGA TTTAGATATC ATTTTTGAGA TTAAGAATTG TCAAAAAAAT 11940 

AGCTAGCAAG GAAAGACCAA CAAATAGCAT CCAAGTCAAC TGTATATTCC ATACGGCTAC 12000 

TAGTGAAAAA CAAGCTGTTC CCACAGGTAT GGATAAGGTA AACAATAGAC CTAAAAAATT 12060 

ACTAGTACGA GCTAGAACCT CTGGAGCTAG ATTTTTCATG AGCATGGCAC TAATCTTTGG 12120 
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TTGAACTTTA CCAGACACAT ACAGAGTAAA GAAGAGAAAT AGCAAACCAA GCACGACTTG 12180 

ATTGAATAAA TTAGCCAAAC CAACTAGACT AAGTCCTACG GTCTCCCACA TCATCAATCT 12240 

AGGCAAGGAC TGCTTCCCAA AATAATCATT GCCCGTAAGG CTACTGATGA TGACTGATAC 12300 

TAAAACACAG AATTGATTGA TAAATAGTGC CTCTGTATAA GAAAAATTCA AGAGAGAATG 12360 

GCTCAAAAAG AAGATATTAT AAATTCCACC CAAAGCGCCA CCCAAGGAAT TAATAAGCAA 12420 

GACAGCAAAG AGCATAAAAC CAAAGTTTTT CTGTCCACTT TTAAGAAAAA CGAGACGTAA 12480 

ATTTCGGTAA ATTGTTAGGA ACTGGTCTTT GATAGAAAGC TTCTCATTTT TTAAGTTTTC 12540 

ACCATCAGCA GATGACATTG ACAGGCTCAA TTTGCTTTTT CCTAAAAAGA GGATAGTGGC 12600 

TGATACTAGG AAAAAGCAGG CATTGATTCC CGCAACGAGA GAAAAATTGT TGACCGATAG 12660 

AGCTAAGAGC CAGACTCCGA AAGCTTGACC ACCAATAGCT GAAATATAGG TGATGAACTG 12720 

TGAAAAAGAA TAAGCCTCCA TCAGATCATC TTCAGCTACT TTTTCCTTAA TAAGAGGCAT 12780 

ACGCAGGCCA CCTGCAAAAT CACTGATGAT ATCACTAATG ACATTGATCA AACACAGGCT 12840 

AGAAAAGGCA AAGAGACTAG CTTGCTGAAC AACTAGGGCT GCTAGAAAAA ATAGAACCGC 12900 

CTGAAACAAA CCGCTATAGA CCATCCATTT GACCTTGTCC CTCGTGTAAT CTGCCCGAAT 12960 

CCCTGCAAAA ACTGTAAAGA GGGTCGGAAG AATCATGACA ATATTCGCCA TAGCAACAGC 13020 

AAAAGATGCT TGTGACAAGG TCGATGCATA GACGATAAAG ACCAGGTTGA AAATCGAAAC 13080 

ACCAAAAGCA TTGAAGAAGC GTGG 13104 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19250 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

CCGGGCAAAT AGTTTTGAAC TTTTCATCAT TTTCTCCTTT AAAACTTTCT CTCCATTATA 60 

GACTCTTTTC AGAAAGTTGT CAACAGAATT TTCAGAATTT TTGAAAATTA TTTTTCAAAC 120 

AACATCTTTG CAAAAAATAT GAATATCGTA AGCGCGTCAT AACAAGGTAT CTATCATTCA 180 

TGGAGCTCCT CCTGTATACT ATTAGTAAAG TAAATATTGG AGGATATTTT AATGCCACAA 240 

CCTATTGTTC CTGTAGAGAT TCCACAATCT CGTCGTTTTG ATTCTAAAAA GAGAAATGAT 300 

ATTCTrCTTA AAATTCGTAT TGGCAAGCTT GAAGTAAGTT TTTTTCAATC TCTCAATCTC 360 
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GAAATGATAG AACAGCTTTT GGATAAGGTG TTGCTCTATG ACAATTCATC TATCTAGCCT 420 

AGGGCAGGTC TATCTCGTGT GTGGGAAAAC TGATATGAGA CAAGGAATCG ATTCACTGGC 480 

TTATCTCGTT AAAACCCACT TTGAATTGGA TCCTTTCTCC GGTCAAATCT TTCTCTTTTG 540 

TGGTGGACGT AAAGACCGCT TTAAAGTCCT TTACTGGGAT GGTCAAGGAT TTTGGCTACT 600 

ATATAAACGC TTTGAGAACG GCAGACTGAC TTGGCCCAGT ACAGAAAAGG ATGTCAAAGC 660 

TCTCGCACCT GAACAAGTAG ATTGGCTGAT GAAAGGCTTT TCTATCACTC CAAAAATATA 720 

GTAGATTGAA ACTAGAATAG TACACCTCTG CTTCTAAAAC ATTGTTAGAA ATCGATTTTA 780 

CTGTCCTGAT CGATTTGTCC TGTTATTATT TCATTTTACT ATAAATCCAT CAGAAAGTCG 840 

TGATTTCTAT TGAAATGAGG ACTTTCTTTT TATACTCATC TGCTTTCAAA AAGCACTCTA 900 

GTCCATCTCC GATTAACGAT GGACTTTATC ACCTCCTTCT CCAGTCCTTG TATAACATCT 960 

TGAAGTTGAT TCATGACATC TTCCAAAGTT CGAAAGGCTT TATTCTTAAA TCCACGTTTA 1020 

CGAATCTCTT TCCACACTTG TTCAATGGGG TTCATCTCTG GTGTGTATGG AGGAATAAAT 1080 

GCAAAGCCAA TATTAGTCGG AATCTTTAAG GTACTTGATT TATGCCATAT AGCATTGTCC 1140 

ATAACGAGTA AAAGATAATC ATCTGGATAA GCTTGTGAAA GCTCCTATTC CTAAAGCCCC 1200 

TTTATAACCT CTTGCGAGAG AGACTATTGA CTCAGCCCTT ACTTCATGCG GATGAAACCT 1260 

CCTATCGGGT TCTAGAGAGT GATAGCCATC TGACCTACTA TTGGACTTTT TTGTCAGGTA 1320 

AAGCAGAGAA ACAAGGGATT ACGCTTTACC ACCATGATCA GTGTCGAAGT GGTTCAGTAG 1380 

TACAAGAATT CCTAGGAGAT TATTCTGGCT ATGTTCATTG TGATATGTTG CGGCAGTAAC 1440 

TTAGGACTTT AGTCCTCTAG TTCTGCCTAT GCGATAGCAG TCCAAGGTTT AGGAGTAAGG 1500 

CGACGCTAAG CTTGGTAAAC TGCGAACAGC TAGAAGCTTA TCGTCAACTG GAAGAAGCTG 1560 

CACTTGTTGG ATGTTGGGCG CATGTGAGAA GGAAGTTTTT TGAAGTGCCC CCCAAGCAAG 1620 

CAGATAAATC ATCCTTAGGA GCTAAAGGTT TAGCCTATTG TGATCAGTTA TTTTCCTTGG 1680 

AAAGAGACTG GGAGGCTTTG CCAGCTGATG AACGGCTACA GAAACGTCAA GAACATCTCC 1740 

AACCCCTACT GGAAGACTTC TTTGCTTGGT GCCGTCGTCA GTCAGTTTTA TCGGGTTCAA 1800 

AACTAGGAAG GGCAATTGAA TACAGCCTCA AGTATGAAGA AACCTTTAAG ACCATTTTAA 1860 

AAGACGGACA TCTGGTCCTT TCCAATAATC TAGCTGAACG CGCCATTAAA TCATTGGTTA 1920 

TGGGACGGAG TAAAAGAGTC CAGTGGACTC TTTTAGCCTA AGCTCAGTTT AAAAAAACGA 1980 

GGGTGGTTAT TTTTAAAAAA GCGAGGGTGG TTATTTTCTC AAAGTTTTGA AGGAGCTAAA 2040 

GCAAGAGCTA TTATTATGAG TTTGTTGGAA ACAGCTAAAC GTCATCAATT ATAGTGCGTT 2100 

GAATCTATAA CAGTACGCAT CGACTGCTAA AATATTTCTA TAAATCAATT TTCCTTTCCT 2160 
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AATCGATTTG TTCATATCTT ATTACAATCC ATTATAAATA GCGAGAAATA TCTATCCTAT 2220 

CTTCTAGAAT GTCTTCCAAA CGAGGAAACT CTCGTAAACA AAGAGGTTTT AGAGGCCTAT 2280 

TTACCGTGGA CTAAAGTTGT ACAAGAAAAG TGCAAATAAG AAATCTCCAG ATTAGGAACT 2340 

ATATATGAGT TCTCTAGTCT GGAGATTTTT CAATAGACTT CGTTATTGGG CGGTTACTTT 2400 

CGAAACTTTG AAAACTTCAA AAAACGGATT TTTATCGCTC TGAACATCAA AAAAGAAAGG 2460 

ACGAAATTTG TCCTTTCTCA AGCTTAGCTT TTCTTCAACC CACTACAGTT GACAAAGAGC 2520 

CCTTTATTCT ATCAAACATG AAGCGCAAAA ACAAGCCAAA AATCCGATAG AATGGCTATC 2580 

CCTCGACTAT CAAGTAAGAC ATTTCCATCA AATACGTTCA ATTTTACTCT TGTTCTACTA 2640 

AGAATTAATC ATCTCGTTTT GATTTATTAA AAATATACAA TTCAGCTTTT CCTCCAAACT 2700 

ATTTTATCCA CTATCCCTGT ATAGCTCTGT ATTATCTT AA CAACTTTAGT AGAGACATTT 2760 

TCCTCAACAT AATCCGGAAC CGGTAATCCA AAATCCTCAT CTTGTGCCAA GCTAACAGCA 2820 

GTTTCAACTG CTTG AAGAAG AGAATTTTCA TCAATGCCTG CCAAAATAAA TCCTGCCTTA 2880 

TCTAAGGACT CAGGACGTTC TGTACTTGTA CGAATACATA CAGCGGGAAA AGGATAACCT 2940 

TGACTAGTAA AGAAACTACT TTCTTCCGGT AAAGTTCCCG AATCAGATAC TACAACAAAT 3000 

GCATTCATCT GTAAACAATT ATAGTCATGG AATCCTAGTG GCTCATGCTG AATCACACGT 3060 

TTATCTAGTT TAAAACCGCT CTCTTGTAGC CTTTTCTTTG ATCTAGGATG GCAAGAATAT 3120 

AAGATTGGCA TATTATACTT TTCAGCTAAT TGATTAATTG CTGTAAAGAG AGAAATAAAA 3180 

TTTTTATCTG TATCAATATT TTCCTCACGG TGAGCTGAAA GTAAGATATA ACCTCCTTTT 3240 

TTCAATCCCA AACGTTCATG GATATCTGAA GACTCAATAG CAGATAAATT TTTATGTAAC 3300 

ACTTCTGCCA TAGGAGAACC AGTTACATAT GTGCGCTCTT TAGGTAAACC ACACTCATGT 3360 

AAATACTTAC GTGCATGTTC AGAGTATGCT AAGTTAACAT CTGAAATAAC ATCAACAATC 3420 

CGACGATTAG TCTCTTCCGG TAGGCACTCA TCTTTACAGC GATTGCCAGC CTCCATATGA 3480 

AAAATTGGAA TATGTAAACG CTTGGCAGCA ATAGCTGATA AACAAGAATT TGTATCCCCT 3540 

AAAATCAATA AAGCATCTGG TTTAATTTGA TTCATCAATT TGTATGAAGT ATTAATAATA 3600 

TTCCCTACAG TAGCACCAAG ATCATCTCCA ACAGCATCCA TGTATACGTC CGGAGTGTCT 3660 

AACCCTAAAT TATCAAAGAA AATACCATTT AAATTGTAAT CATAGTTTTG TCCAGTATGT 3720 

GCCAAAATAA CATCAAAATA CTTTCGACAT TTAGTGATAA CACTACTTAG ACGTATAATC 3780 

TCTGGACGTG TTCCCACAAT AATCAATAAC TTAAGTTTGC CATTATCTTT AAAGTGAATA 3840 

TCACTATAAT CTGTCTTAAT TTTCATTTAT TTCTCCACTT GTTCAAAAAA AGTATCTGGA 3900 
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TGTCTAGGAT 


CAAATGACTC 


ATTAGCCCAC 


ATGACAGTAA 


TTAGATTTTC 


TGTATCAGAA 


3960 


AGATTAATAA 


TATTATGTGC 


ATAGCCCGGT 


ATCATATGTA 


TTGCTTCAAT 


CTTATCGCCC 


4020 


GACACTTCAA 


AGTTCAGAAT 


AGGATACTCT 


TGACCGTTTT 


CATCCAGCCC 


TATCCTACGC 


4060 


TCTTGTATTA 


AAGCACGACC 


AGAAACAACC 


ATGAAAAATT 


CCCACTTAGA 


ATGATGCCAA 


4140 


TGTTGCCCTT 


TGGTAATGCC 


AGGTTTAGAA 


ATATTAACAG 


AAAATTGACC 


CGTATTTTCT 


4200 


GTTTTTAATA 


ATTCCGTAAA 


ACTACCTCGT 


TCATCTATAT 


TCATTTTTAG 


AGGAAACTTA 


4260 


AACTTATCTA 


CTGGTAAATA 


AGATAGGTAG 


GTAGAATACA 


ATTTCTTTTT 


AAACGATCCC 


4320 


TGAGGAATTT 


CAGGCATAAC 


TAAACTATCA 


GGCTGTTTTT 


TAAATGTTTC 


TAATAGAGAG 


4380 


ACAATCTCTC 


CTAAGGTTGC 


ACGATGAGTC 


GTTGGTACGT 


AGCAGTAGTT 


TCCTGATGGG 


4440 


CTAGGTAAGA 


TTTGTAATCC 


ATCTAGATTA 


CAACGATGAG 


GATTTCCTTC 


CAATGCAGTT 


4500 


AGACACTCTT 


GTATCAAATC 


ATCAATATAC 


AGCAACTCCA 


ATTCTACACT 


TGGATCATTT 


4560 


ACTTGAATAG 


GTAAATCGTG 


AGCTAGATTA 


TAACAGAAAG 


TTGCTACAGC 


AGAATTGTAG 


4620 


TTAGGACGGC 


ACCACTTCCC 


ATAAAGATTC 


GGGAAACGGT 


AAACTAAGAC 


AGGTGCTCCC 


4680 


GTTTTCTTTC 


CATATTCAAA 


GAAGAGTTCT 


TCCCCTGCTA 


GCTTAGATTG 


TCCATATATA 


4740 


GAGTTTGAAA 


ATCGGCCTTC 


TAAACTAGCT 


TGAGTAGAAC 


TTGAGAGTAG 


AACAGGACAA 


4800 


GTGTTTTCAT 


ACTTTTCTAA 


AATCTCCAAT 


AATCTACTTG 


AAAAACCGTA 


ATTTCCCTCC 


4860 


ATGAATTCAT 


CAGGATTCTG 


TGGACGATTG 


ACACCAGCTA 


AATGGAATAC 


GAAATCGGCC 


4920 


TTCTTACAAT 


ATTCATCTAA 


TAAAATCGGA 


TCTGTATCAC 


GATCATACTG 


AAAAATCTCT 


4980 


CCAATCTCTA AATTAGGACG 


AGTCCTATCT 


CGTCCATCTT 


TCAAAGCTTC 


CAGAGTACAG 


5040 


ATAAGATTTT 


TTCCTACAAA 


TCCTTTCGCT 


CCTGTGATTA 


AAATATTTTT 


AATCATGCCC 


5100 


CCTCCTTATT 


TTATATGCTG 


TTTTAATAGT 


TAACTCTCTC 


GACAATACAT 


GATACATTAT 


5160 


ATATCCTTGA 


TAATTTTAAT 




AR A VPTT A f* A 




iw. 1 ALIA 1 A 




TCACGAATTG 


CTGTCTGTAT 


TTCATCTAAT 


TCTAGCAACT 


TTCTTTTAAC 


TTGCTCTACA 


5280 


TCCATCAAAT 


CGGTATTATT 


ACTATTGAAT 


TCTGTCAACA 


AATTTCTATT 


CGTACTACCA 


5340 


TCTTTGAAAT 


ACTTATCATA 


GTTAAGATTA 


CGATTATCAC 


TAGGAACTCT 


ATAAAAATCA 


5400 


CCCAAATCAA 


TTGCATTTGC 


GCACTCTTCG 


TTAGTTAATA 


GTGTTTCATA 


CCTTTTTTCT 


5460 


CCGTGTCTAA 


TACCTATAAT 


CTTAATATCT 


TGTTCTGAGG 


CAAAAATTTC 


TGATACAGCC 


5520 


TTAGCCAACA 


CTTCAATCGT 


ACATGCTGGT 


GCTTTCTGAA 


CTAGTATATC 


TCCAGATTTC 


5580 


CCTTCTTCAA ATGCAAATAA 


AACCAAGTCT 


ACTGCTTCTT 


CCAATGTCAT 


CACAAAACGT 


5640 


GTCATGCTAG 


GTTCAGTAAT 


TGTAAGAGCA 


TTTCCTTGCT 


TAATTTGCTC 


AATCCAAAGA 


5700 
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GGAACGACAG ATCCACGGCT ACACAGAACA TTCCCATAGC GAGTCACACA TATCTTTGTA 5760 

TGCTCAGGAT TTACCGTCCT GGACTTAGCA ACAGCAATCT TTTCCATCAT AGCCTTGGAT 5820 

GTTCCCATAG CATTGACAGG ATAAGCCGCC TTATCTGTAG AAAGACAGAT AACTTGCTTT 5880 

ACACCAGCTT CGATAGCCGC AGTGAGGACA TTCTCCGTTC CCAAAATGTT AGTTTTTACC 5940 

GCTTCTACAG GGAAAAATTC ACAAGAAGGT ACTTGTTTAA GAGCAGCAGC GTGAAAAACA 6000 

TAATCCACAC CATGCATAGC ATTTTTTACC GAAGCTAAGT CACGCACATC TCCAAGGTAA 6060 

AAACGGATTT TCCCAGCCAC TTCTGGTACT TTTACCTGAA ACTCATGACG CATATCATCT 6120 

TGTTTCTTTT CATCTCGCGA AAATATACGA ATCTCTGAGA CATCTGTTTC TAAAAAACGC 6180 

TTGAGAACCG CATTCCCAAA TGAACCTGTC CCTCCTGTAA TTAGGAGAGT TTTTCCTGTA 6240 

AATTGTGACA TATATTACAC TTCTCCTTCT AGTATGTCTG CAATTTTCTT ACAAGCCGTT 6300 

CCATCTCCAT ATGGATTTGA AGCTTGACTC ATTGCTTGAT AAACTGAATC ATTTTCTAAT 6360 

AATTCTTTAA AATGCCTATA AATATTATTT TCATCAGCAC CTACAAGTTT CAAAGTCCCT 6420 

GCTTCAATTC CCTCTGGACG TTCAGTTGTA TCTCTCATAA CCAAAACAGG TTTTCCTAAA 6480 

CTTGGAGCCT CTTCCTGAAT ACCACCACTA TCTGTTAAAA TTAAATAACT TCTTGATAAA 6540 

AAATTGTGAA AATCTAATAC TTCTAAAGGT TCGATCATCT TGATACGTTC ACAGCCACTT 6600 

AGTTCTTCCT CAGCAATTTG GCGAACACGA GGATTCATAT GGATAGGATA AATAGCCTTG 6660 

ACATCTGAAT ATTCTTCAAT AATCCTTCTA ATTGCTCTAA ACATATGTCT CATCGGTTCA 6720 

CCAAGATTTT CACGACGATG AGCTGTAATT AGAATAAACC TGCTTTCTCC TATCCATTCT 6780 

AACTCAGGAT GCGTATAGTC CTCTTGAATT GTAGTTTGTA AAGCATCAAT CGCCGTATTA 6840 

CCTGTCACAA ATATGCTCTC TGGAGTTTTT CCTTCTCTTA AAAGATTATC TTTTGAAAGT 6900 

TGTGTTGGTG TAAAATGATA CTGAGCCAAA ACCCCAACTG CTTGACGATT AAACTCTTCA 6960 

GGATATGGTG AATAGATATC GTAAGTGCGC AAACCAGCTT CAACATGACC AATTGGAATC 702 0 

TGTAAATAAA AGGCCGCCAG TGAACTAGCG AAGGTCGTAC TTGTATCCCC ATGAACTAAC 7080 

ACCAAATCAG GTTTTTCTGA CTCTAAAATA GCCTTCATTC CTTCCAAAAT GCCAATGGTC 7140 

ACATCAAATA AAGTTTGTTT ATCTTTCATA ATAGACAAAT CAAAATCGGG AATAATCCCA 7200 

AATGTGTCCA AGACCTGATC CAACATTTGA CGGTGTTGGC CCGTAACGCA AACTAATGTT 7260 

TCAATATTCT TACGTGTTCT TAACTCTTTG ACCAAAGGAC ACATCTTGAT GGCTTCTGGA 7320 

CGAGTTCCAA ATACTACAAC TACTTTTTTC ATATATTTAC TTACTCCTAA CAAATAATGA 7380 

ACGGTTCTTA AAATAAATTA GATAACGGCT AATCCATAAC ACCACCTCAG ACATACTTGA 7440 
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ACAAATAGCT AATGTTACTA AACTAAAATT ATCAGACAAG ATAAATATTC CTAATOCCAA 7500 

AGTTTGGACA ATCGAAGCTA ATATAGTTGT CATTGTAGTT TCTTTCACTT TATCAATAGC 7560 

TCCTAAGACA GGCCATCCGT AAATCATAGA ATAAAAACTA GCAACAAAAG CGGGTAATAA 7620 

GTACTTAAGA AAATCTGCTG AAACGGTATA TTTTTCACCA CCAATTATAG AAAGAATTTG 7680 

ATTTGAAAAG AATAAAACTA TCAAAACTCC AAAGATAATA GGAATAAACA TAATCCGATT 7740 

AATACTCTTA ACCGATTGTA TATCTTTAGT ACGTATCATA TGCGGATATA AACTATTCGC 7800 

TATAGGATTA TACAATGATT TTGCTGCTGA AAGCAGTTGC ATTGCTATCC CCCAAAAGGC 7860 

TATCTCTTGA CTTTGTAAAT AAAAACCCGA AATGACTGTC GTAAAGACGC CAAAAATAGT 7920 

AGTTGCAAAA TTGGATAAAA AATAAATAGA GGATTCCTTT AAATCTTTAA CCCAAACAGA 7980 

CAGATAAGAA AATGATAATT TAATTCCATA ATAATGAAGG AATCTATAAG AAACTACTGC 8040 

AGCAACTAAA TTCCCAATTC CTTCCAATAT AGGAATCCAT AAAATAGAAG AATCATCTTT 8100 

TACTACAATA AATGTCAAAA TTGTAATGAT AGTTTTAGAA ATAATATAAG GAATTGCAAC 8160 

TGCATGCATC TTTTCAATTC CACGAAATAA AAAGTCAAAG ATAAAAATAT TGGTCACTGT 8220 

AGCTAACAAA TAAAAAACTG AAAAAAGAAT ATTCTCTCTC ATTATTGGGA TTTGCCACAT 8280 

CAATATGGTG TAAATTAGAA TCGAAATGAT AGATAAAAAT ATTTTTTCAA CTAGAGTATC 8340 

TCCAACTATC CTTCCAATCT TTGAGGGAGT AGTACAAGCA TTTACAATAT TTTTTGTAGC 8400 

TGATATCATG AAACCAAAAT CAATCACCAG TTGAACATAA GCTATTAACG CTTTAACATA 8460 

AATAACCATT CCATACGCGT CTAGCGAAAG CACCCTTGTC AAATACGGGA GTGTTAATAA 8520 

AGGAAATAGT AATTTAACAA TATTCAGAAT ATAGAGAGAA CTTGTATTTT TTATAAATGA 8580 

AATTCTATCA ACTTTCACGA ACTAGTCCTT CCAAAAAAAG ATCTAAATAG TCCAAACTAC 8640 

TTCTCGCTTT CAACACCAAT TCTGAAGGTA TTGTTATCGG TTTTAGATGA AAAGTTTCAA 8700 

GTTTCTTTAC AATACTATTA ACACTTGAAT CAAATAAAGA TTCACAACGT TGTAACTCTC 8760 

CAATTGCTCC ATAATAACGT GCTGTTTTTT CTGGATGGCA TGCAATGGCA ATCACAGAT I 8820 

TATTAAAACA TGTTGCCACT ACCCCAACAT GTAATTTACA AGTTAAAACC ACATCTACCA 8880 

TTTTCAACAA TGATGTCATT TCTGCAGGAG AATGATACTT GAATTGAAAA CAATCCTCAG 8940 

TTCTAACTAA TTTTCTAAAT TCCTGATAAT AAGCATCTTC ATAAGGTAGA ATGGAATCCG 9000 

AAGTTACTAC AACATAATAG TTAGGATTGT TTTCTAGAAA AAGACTAATT GATTCCGCAA 9060 

ATTTTTCAAG AGCTTTTTTG GAATGATTAT AGTGAACAAG AATTATCTTC TTATCTTTAG 9120 

CTTCTCTTTT CAATTGACAC AGCTGCTCTG TTTTTTCTTC TCTTAATTTA CTTGAAATAA 9180 

TTAAATCAAA GGTTTCATGC ACTGGAGCCG AAGGCGACAA ATGCTTCAAA GAATCAAATG 9240 
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ATTCTCGATC 


ACGAACTGTA 


ATAAATTGAG 


CATGATTAAT 


AATTCTCTTT 


ATACCATAAT 


9300 


TCATCAAAGA ATCGTTATTA GGCCCTGCAC CAATACCTAA 


TACTCCTATA 


GGCTTTTTAA 


9360 


AATATGAAGC 


CCAAATTCCC 


AAAGGTAAAA 


ATCGTTTAAA 


TTGGATTAAA 


TTATCACGAA 


9420 


AACGTGCATT ATGCCCTTCC 


CCAAAATATC 


CTCCCGGGAT 


ATACAAAATA GCATCTGCTT 


9480 


GTTTTTTAGT 


AAAACTTTGT 


TTTTGGCGAT 


ATTCTTTCAA 


GTACATTTGA 


AAGAAATCTG 


9540 


ATGGATTATA AAAAGAAACT 


TCATATCCTT 


TAGATTCTAA 


TAAATCATAG 


ACAATCTCAC 


9600 


CGTAAAGATA ATCACCGTAA 


TTACTTGAAC 


CATAATCCGT 


TGCACCATGT 


AACATAATTT 


9660 


TTTTCACCAC 


TATTTTTTCA 


ACCTCCTAAA 


AATAAATATC 


ATAATCAAAC 


TATACATAAT 


9720 


AGGACGATAA ACATCTATTG AACTACTTCT 


CACTAAAAGC 


AATAGTTGAG 


AAATTACCGA 


9780 


AAAATAAATA 


ACTTTTGAGA 


TTTTACTTGT 


TTGAAAAGCT 


CTGAAATTTA 


ATCGCCATCC 


9840 


ACTAAATATT 


CCCAAAACAA 


AACTCCAAAA 


AACACCACCA 


TAGTAACCAA AGTTCCAAAA 


9900 


TAATTCTTCC 


ACAAAAGAAG 


AGCCTACAGG 


TAACCCCAAA 


AATTTATTAA 


TAACAACCGT 


9960 


CGCTGATGCT 


TTATCAAAAA 


AATCACCAAC 


TAACCATCCA 


ATAGGAAAAA 


TTGATAGGAT 


10020 


AGTGCGTAGA 


AATGTCATCC 


CATATTCATA 


TGGAATGCTA 


CTAGGCACAA 


CAGTTACAGC 


10080 


AGAAGCTACT 


GTTAGGCTGG 


TCAGTCCCGA 


CTCTGAAAAT 


ACTTCCCCTA 


GTATATTCTT 


10140 


TACAAAATCT 


AATGAAGAAA 


AGGAATCAAA 


TAAGTATATA 


CCTATAGTAT 


TCAAGTCGAA 


10200 


ACGGTGCCCC 


CTAATAACAA 


CTAATACATT 


TAATAGAAAT 


ACAGTTACTA 


TTAAAAATAC 


10260 


AAGTACTCTT 


TTCTTCGAAA 


AAGTAATCCC 


TAAAGATTGT 


GTGTATACTA 


AAACCAACGC 


10320 


CAAGATTGAA 


AACACCTGGA 


TTTTACGACT 


TCCTGTTAGG 


ATCATTATCA 


AAATTAGGTA 


10380 


AAACAACATT 


ACCCAAAAAA 


TAGTACGCTT 


TATAACTCGG 


GACAGCTTAT 


CTGAATAAAA 


10440 


CAAGGAGAAC 


ACACCAGGAA 


GCATAAGTAC 


TCCTAAATCA 


TCTATTATTC 


CTGAACTAGC 


10500 


TGCCTCTGAA TATGCTGAAT 


AGCTATTCGC 


CGCTCTAACT 


GCTAGTACTG 


TTTTAGAATC 


10560 


AGTTATTACC 


CTAGAAATAA 


AGCCCACTCC 


TGTTAAAATC 


CTACCCGCAT 


TGTACAAAAT 


10620 


TTTCTCTTCA TTTTCCTGAT 


AATTTTGTAC 


TTCTGAATGA 


TAATGTACCT 


TTCCATCACT 


10680 


ATAAAAAAAT 


AAATAGCCTA 


CAGAATAACA 


AAACAAAATC 


CAAATTATAA 


AAATATATGA 


10740 


ATGAAATAAT 


TCTTCATTAT 


TATAGAAGTT 


ACTAGGGCTC 


CACAGCAGAG 


TTGTTTGAAA 


10800 


CCCCATATAC 


TCATTGAAAA 


TTAATCCAAA 


CATAAAAAAA 


TAAGATAAAA 


TCAGATACCA 


10860 


TACAGAAAAA 


TCATATATAC 


TAACTTTTTG 


TAAAATAAAA 


CCAGTAATTT 


GAAAAATAAT 


10920 


TAGAAAGCAA ACCCATATAA ATATAGACGG AACATAATTA 


GATATAAGAA 


AACCATTATT 


10980 
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CCAATTATCG 


AGAGTCCAGA 


ACAAGTAACA 


GAAAGCAAAT 


ATAAAACTTA 


ATGTCACTAG 


11040 


TGTCACTCTA 


CAAATATACT 


TTGTCTGCAT 


CTATATCTCC 


TTTATTACAC 


ACATTTCTTG 


11100. 


ATAACGATTC 


AATAATTTAC 


TAGCTTGATA 


ACAAATATCA 


TAGAGTCCAT 


CTGTCATACT 


11160 


GTTATTTATT 


TCAAAACGAT 


TGCATTCCTC AGATGTTAAA GACAGTACTT TATCTTTCCA 


11220 


TAGCAACACA 


GACTCTTCGT 


TGATAGGTAA 


GTAACTAATG 


TTTTTGGTCA 


CATCTACTTC 


11280 


TTGCGTCACT 


GTATCTGACG 


ATAAAATTTG 


TAATCCCGAT 


GCCTGAGCCT 


CTACTAGAGA 


11340 


AACAGGCAAC 


CCCTCATATT 


TAGACGGAAG 


CAAAAAAACA 


TCCATCGCAG 


ATAATAAATC 


11400 


AGAAATATCA 


GTCCTTCTCC 


CTAAAAATAG 


CACATATGGG 


GTCAGATTTA 


GTTCTAAAGC 


11460 


TTTCTGTTTT 


AATTTCTGCT 


CATCCTCACC 


ATTACCAACT 


AGGAGTAAAA 


TAACATTTGG 


11520 


TTTGATTAAA 


ATGAGTTCTT 


TTAAAACGTT 


AAATAAATAA 


CTTTGGTTTT 


TTTGATCTGA 


11580 


TAGGCGAGCT 


ATATTTCCTA 


ATACGAACTT 


ATTTGACACA 


TCTAATTCTC 


TACGACATTT 


11640 


TTCTCTAACA 


TCTGACAAAA 


ATTGATACTT 


TTTCAAATCA 


ATTGCATTAA 


AAATAATTTC 


11700 


AATTTTTCCG 


TCTTTATACG 


CTTTCTCTCC 


ATATAACCAC 


TTAGCCGAAT 


CTTCCCCACA 


11760 


TGCAAACCAA 


TGAGTTGCTA 


AGATTTTTAC 


CAAAATTGTT 


ACTAATTTAC 


GCAATACTTT 


11820 


TTGAAAACTG 


TTTTCTGTTA 


CATAAGCCAT 


ATGACTATGA 


ATAATTCTAA 


TTTTACAACC 


11880 


AATTATTTTA 


GATAAGATCA 


GACCAATTGC 


AGATTTATAG 


CCATGGCAAT 


GAACTATATC 


11940 


ATAATCTCCT 


TTCTTTATTA 


TTCTAGCAAG 


AGAGAGAAAC 


TGATGTAGAG 


GCTTTTTCCT 


12000 


TAATAGAGGC 


ACATGATAAA 


CCTTTGCACC 


CAATTCTTTC 


ATTTTATCCT 


CTAAAAATCC 


12060 


TTGTTCTTTT 


CCAGGCACAA 


TAAAATCAAA 


TTGAATTTTT 


TTTCTATCAA 


TGTGAGAATA 


12120 


ATAGTTGAAT 


AGAAAACTTT 


CTACTCCACC 


ACTATCTAGT 


GTTGTAAATA 


GATGTAATAC 


12180 


TTTAATCATT 


CTTCTTCCTT 


AAGCTTAAGA 


TTCGCTTCTC 


TAATTCTATT 


TCTGTTTTTT 


12240 


GTTTTTCTAA ACTAATTCTG 


TCCATGAAGT 


TATCACAATT 


CTTAATTAGC 


TGTTTCCTGT 




CAAGGTTTTG 


AATATACAAA 


GCCAAACAAT 


CTTTTTCCGA 


TTCATCCTTC 


ATAGGTAAAA 


12360 


CGAAACCAAA ACCATTCTCT 


ATTGACACTT 


TTTCCATATA 


AGTATCTTCA 


CAAACTAAAA 


12420 


TAGGTTTATA 


CAACAATGCA 


GCAAAGTAGA 


GTTTATTAGA 


CAAAGCATAG 


TCTAGTAAGG 


12480 


GAGTGTGATT 


CCCGTATAAA 


TTCAAAACAA 


CATCTGTATT 


CTTATAAAAA 


GACATGGTAT 


12540 


CTTTAGGCTG 


GAATGTGTCC 


ACCAAGTTAA 


CATTGCTGAT 


ATTTTTTTCT 


TGACAAAATT 


12600 


CCCTTAATTC 


TCCTGCATTA 


GTACCTATAA 


AATTCAACTG 


AAATCGACTG 


TCATTTGCAA 


12660 


AAAAATCGAT 


TATTTTTTTA 


TTTTGTTCTT 


GAAAACGAAT 


TAAACCAATG 


TAGGAAAGTT 


12720 


GAATTGGAAA 


CGTACTATTA 


TTTTTTAACT 


GCTTTACCTC 


GTTTAATTCT 


ATCATATTGG 


12780 
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GTAGGTTATG GGTAGTAAAA TACTCTCCCA TTGGTAAAAA AAATTTATAG CCGTCTGAAG 12840 

AAACGATATT CATTAAAGAA TTTTTCACCA ATTGTTTCTG AACCAAACGA TAAACCAAAA 12900 

ATTTTTCATA ACTGTAATCA CGAATATCAT AAATATATCT ATTTTTAAAT GAAAAGAGAA 12960 

GAAAATCTAC TAAAATGAAA GACACAATAC TATGTAACGG CAATATCATA TCATAATCAT 13020 

TTTCTTTTAG CTTCTTTTTA ATTTCTTTTC TGAATTTTAC ATAACCTAAT ATCTTACTTA 13080 

ATTTTCCTTT ACCAGAAAAA GAAATACGAT AGTAGTTTTG TTTTGTAATA ATCTCGTTAA 13140 

TATTCTTATC CCAATATATA ACATCGTAAC TAATAGACAG TTTCTTCAAT AATTCTTTAT 13200 

AAAAATTGAA GTAAGGAGTT AGATATATAT TATCAGATAG TATAAACAGT ACTCTCATTA 13260 

AATTATTCTT TCTTACTTTC CCTCTCTAAA CATGTCTCCA GTTCGAGCAT AAACTGCTCT 13320 

TTTGAAAAGT GATTTTCATA GTAACAACGA GCTTTCTTTC CTAACTCTCT TTGTCTCTTA 13380 

ATAGATAACA TACTAAATTT ACAAATATTT TTTGCCAATT GTTTTACATC TCGTTCGGGA 13440 

CTAACATATC CACAATTTGC TTCTTCTACA ATTATTTTAG CATCTCCTGA AATTGCACCT 13500 

ATAATTGGTT TGCCTGCCGC CATATAAGAk TGTACCTTCC CAGGTATAGT ACGAGAAACT 13560 

ATCGAGTCTC CTATTAAAGA AACTAACATA GCATCTGATT TTTTATAGAA GGATGGCATT 13620 

TCCTCCAAAG AACGTCTTCC ATAGAAGGAA ATATTCTTTA ACTCCAATTC ATGAGCTAAT 13 680 

GCTTTCATGC TTAACAATTC CGTACCATCT CCAACAAAAT GAAAATGAAT TTTCTTGGGT 13740 

AAATTGGTAT TCTTCTCTAT CAAACTGGCA GCTTTCAAAA TAGTTTCCAA ATTTTGTGCT 13800 

TTGCCAATAT TACCAGCAAA AGTTAGGTCA ACACTTTCTT TATTAACTAT AGATTCATCA 13860 

GGGATAAAAA GATCTTCTGC ATATTGTGGC AAATATGTAA TCTTTTGTTC GGATATGTCA 13920 

AATTGCTTCA CAAAATAATT TTTAAATGAT GGACTAGTGA CAAATATATA ATCACTAGCT 13980 

CGGTAAACTT TTTTTGAGAT AAATTTAAAC AGCTTGAAAA TCAAGCCATC TTGTTTCACT 14040 

CCACCTACGG TTAAACTATC TGGCCAAACA TCCATACAAT ATAGAAACAT CGGTTTCTTA 14100 

TATTTTTTTT TATAAGCCAT ACCAGCCCAT GCCATCATAA CTGGAGACAA TTGGTTAACG 14160 

AATACACAGT CAAAATTCGA TCCATCTTTC GTTTTATACC TCCCCAATAA AACTCCTAAA 14220 

GTAGAACTAA TTGCAAAGCT AAAATAATTC AACAATCGAA ATACAACACT TTTTTTTCTA 14280 

GGGATTGTAT AAGAACGATA TATCGTAACA CCTTCTATAA TCTCACGTCT TTTTTTATTA 14340 

TGACGATAAT CTGCATATAT CTTCCCTTCA GGGTAATTAG GAATCCCAGC CAAAACAGAG 14400 

ACTTCATGCC CTTTTCGAAC TAAATCTTCA CAAATATCTG ACAACCTGAA TGGTTCTGGC .14460 

TTATAATGTT GGCAAACAAA TAGTATTTTC ATTGTCCAAT TTAACTTTCT TTCTTACCAC 14520 
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TACCCTCTAC AATACCTTTT CGTTTCAGTA CGTAAGGTAT TGTCTTAACT ATACATCTAA 14580 

TATCCATTAT CAAAGACAGA TGTTTAACAT AGTAGCCATC TAACTCCGTC TTCATCTCAA 14640 

CAGACAAAGT ATCACGCCCG TTAATTTGTG CCCATCCAGT TAACCCTGGC AAGATATCAT 14700 

TTGCTCCATA CTTATCTCTC TCTGCAATCA AATCTAGTTC ATTTATACCC GCTGGTCTAG 14760 

GACCTACAAT ACTCATATTA CCAACAAGAA TATTAAACAA TTGTGGTAGT TCATCCAAAG 14820 

ATGTTTTTCG CAAGAAAGCC CCTACTTTTG TAATCyATTG CTCTGGATTA TATAAGTTTC 14880 

GAGGCGCCAC ATTTTTAGGT GCATCTATTT TCATAGACCT AAATTTCAAA ATATAGAAGT 14940 

ATTCTTTATG AATACCAAAG CGTTTTTGCT TAAATATAAC CGGACCTTCT GAATCAAGTT 15000 

TAATCGCAAT TGCAATTATC ATAAAAACCG GACACAATAT TATTATCCCT ATTAAAGATA 15060 

ATAATATATC ACCTAATCGT TTTATTATAC CGTACATAAA CAACCTCCAA CTATAAATTC 15120 

TATTTCCATT TTTCATTCTA TTTCCATTTG ACAAATTAAA TCAGGCAGTA CATGCAACTA 15180 

CAGAAACTCA ATATATATTT GGTCACTCAA TGATTTTCAG AAATATAATT CTTTTATCCT 15240 

CTACGTCAGA TAAAACTTTT CTCCATCTAA ACAAAATTTA TTTGTTTCAG TAATATATGA 15300 

GTTCTCAATA ATGAATTAGA AGGTCCAGTT CAATTATTCT TCCAAATAGA CCGAATATTA 15360 

TTTGAAGACA TATCGGTTTC TGAAATTGCA ATCAGTACAT AAGCTAATAA ACTGATAAGT 15420 

ATGCTCTGTA AGAATGCCAG AGTTATATTG TAGTCCCCTT CCATACTATA TTCATTTTAT 15480 

TTTTTACCAT AATTTCCATA GGAACCGTAA ACTCCATACT TATTAACCGA GATATCCAAT 15540 

TTATTTAAAA CAACTCCTAG GAACAGTTTC CCTGTTTGTT TTAATTGTTG TTTCGCTTTT 15600 

TGGATATCAC GTTTATTCGC CTCACCTGTT GCTGTTACCA AGATGGACGC ATCACACTTT 15660 

TGAGTGATAA TTGCCGCATC AATAACAATT CCAATAGGCG GTGTATCAAT AATGATATAA 15720 

TCAAAATATT TACGCAATGT TTCAATCATA TCATTAAAAT TTTTACTTTG TAACAAGGCT 15780 

GTAGGGTTTG GTGATACAGA TCCCGATTGA ACTACAAATA AATTTTCAAT ATTTGTATCA 15840 

CATAAACCGT GAGATAAATC AGCTGTCCCA GATAAAAATT CTGTTAGCCC TGT AATTI TT 15900 

TCACGAGATT TAAAAACTCC TAACATAACT GAATTTCGAG TATCGCCATC GATCAAAAGA .15960 

GTTTTATAGC CTGCACGCGC AAACGACCAT GCTATATTTA TGGAAGTAGT TGTTTTTCCT 16020 

TCCCCAGGGT TAACAGAAGT AACGGAAATT ACTTTTAGTT TATCTCCGCT CAACTGTATA 16080 

TTTGTACACA AGGCATTGTA ATATTCTTCT GCCTTCTTAA TGAACTCCAG TTTTTTTTGT 16140 

GCTATTTCTA ATGTCGGCAT CCTTCTCTCC TATTTCAACT TACCCAAGTT TGGCACAACT 16200 
CCCAAAAGTG TCATCTGCAA TGTATTTTCG ATATCTTCCG GACGTTTCAC ACGAGTATCC . 16260 

AAAAGTTCAA GATGAAGAAC TATAACACTA GTTCCAATCA CCCCTGCCAA AAAACCAATT 16320 
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AGTGTATTGC GTTTAATATT TGGCGAAGAC GGGGATATCG CCGGCCTTGC CTCCTCCAGT 16380 

GTTGTCACGT CAGAAACACG AGTAATACTG ATAATTTTTT GAGCAGCTAC TTCTCTCAAA 16440 

GAGTTAGCGA TACGGCTTGC CTCTTCAGGA ACTCGATCAT TAACTGAAAT AGAGACAATA 16500 

CGGGTATCAA CTGGTACTGT CACTTTAATT TTATTAGCCA AACCTTTTGG CGTCAAATCT 16560 

AGTTTCAAAT CAGAAACAAC TTCCTCCAAA ACATCCTGCG AAAGGATAAT CTCACGGTAG 16620 

TCTTTTACCA GATAAGTTCC TGCCTGCAAA TCCTGATTTG TCAACCCCGG CTTGTCTCCT 16680 

TGATTGCGAT TCACTACGTA AATTCGCGTG GTACTCGTAT ATTCTGGCTT AACAATAAAA 16740 

GTGCTATATG CAAAAGCCCC CGCACCTGTC ACAAGTGCCA CTATTAAAAT CATTAGCTTG 16800 

CGTTTCCACA AGCTTTTAAC TAATTGAAAT ACATCGATTT CTATCGTATT TTGTTCTTTC 16860 

ATCATTTCTC CTAAATTAGT TGATCCATTA CAATTTTTCG AGGATTGTCT ATAAAAAGTT 16920 

CCTGAGCCTT CGCTTCTCCG TATTTTTGGG TAACAAGGTC ATATGCTTCT GCCATATGAG 16980 

GAGGTCTACC GTCTAGATTG TGCATATCAC TTGCAATGAC ATGAACCAAA TCCTGCTCTA 17040 

AAAAATACTG AGCTCTTTTT TTCATGAATT TATAACGTTC GCCAAAAAGT TTGGGTTTGA 17100 

GGACATGTGA ACTATTTACT TGCGTGTAAC AGCCCATATC GATCAGTTCT CGAACGCGTT 17160 

TTTCATTATT TTCAAGAGCA TCATAGCGCT CAATGTGGGC AATGACTGGA GTAATTCCCA 17220 

ACATCAAGAT CTTGCTCAAG GCGCTATGAA TATCGCGATA AGGAGTGTTC ATACTAAACT 17280 

CTATCAAGGC ATAACGACTA TCATTGAGGG TCGGAATCCG CTTTTTTTCC AGCTTATCCA 17340 

GAACATCTGG TGTGTAATAA ATTTCAGCCC CGTAAGCAAT GACCAAGTCA CTCGCCACTT 17400 

CCTTAGCTAT TTCCCGAACC TGAAGAAAGT TTTCTGCTAT CTTCTCTTCC GGAGTTTCAA 17460 

ACATGCCCTT GCGACGGTGA GAGGTAGAAA CAATGGTTCG CACCCCCTGT CTGTAGGATT 17520 

CTGCCAAGAG AGCCTTGCTT TCCTCTCTTG ACTTGGGACC GTCATCTACA TCAAAAACGA 17580 

TATGCGAATG GATGTCTATC ATTTCATCTA CCCTCCATCA CATCCTGTAT AGCTGCTTTA 17640 

ACTACAGCTA AACTACTATC ATCTATTTCC ATCACATAGA GGTTACTGTC TGGCATTGCA 17700 

TAAGAAGGAA GATCCATCCG ACCTGTCCCT TTTAAATCTT GAGAATTTAC TTTATAATTC 17760 

CCTCCACTTT CTAACTGAGC ATTGACCAAA TTTATCATGG TCTCAAGTGG CATATTTGTT 17820 

TGGATAGAAT CTTGCAAGCT ATTAATGATC GTACTATAAT TTTTCAGCAC TTCGGTTGAC 17880 

GTTAATTTTT GAAGGATAGC CACAATCACC TTTTGTTGAT GGCGCCCGCG GTCACGATCG 17940 

CCATCTGCTA GGGAGTAGCG CTCACGAACA AAACCGAGAG CCTGTTCTGA ATCAAGATGA 18000 

ACATTGCCTG CAGGGTAATA CTTTCCATTC GTATGGGCAG TAAATTCTTG ATCATTATAA 18060 
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ACATCAATTC 


CACCCAACAA 


ATCAATCAAT 


TTCAAAAACG 


AAGTGAAGTT 


CAATCGCACA 


18120 


TAGTAATTGA 


TATCCACTCC 


ATAGAGATTT 


TCTAAGGTGT 


GAATGGACGA 


ATCAACTCCA 


18180 


TAAATGCCCG 


CATGAGTCAA 


TTTATCTTTT 


TGATTATTTC 


CACCATCTGC 


GATTGGTACA 


18240 


TAGGCATCAC 


GTGGCGTTGT 


GGTCAAGAGG 


ATTTTCTTGG 


TATCTCGATT 


GACAGTCATC 


18300 


AGGATGTTGA 


CATCTGATCG 


CGACACCGAA 


CTAATAGGAC 


CATAGGTGTC 


AATTCCACTA 


18360 


ACATAGATAT 


TGAAAGACTG 


ACTCTTAGAC 


GTCTTAGGAG 


CTTCTACTTT 




18420 


CCCTTAGTAT 


AAATCTTTTT 


TATCTTCGAT 


GCGTAGTCTG 


GATACTCTGA 




18480 


TTTTCAAAGA 


CACTATTTAG 


GACAATGGCC 


TT AGTCTC CC 


CTGCAATCAA 


APTPTTWT A A 


18540 


GCTGCCAAGT 


AAGACGAACT 


CTGGTTGACC 


GTCAAATCGG 




TY2 A PTTV2 A T A 


XOOUU 


TCAGCTAGTA 


ATTTCTGAAT 


AT TTTC ATT A 






c a r* a rTfr"! 1 /"* 


XOO OU 


AGTTGCGTAA 


CATTTTCGAT 


CT CACT ATCT 




CCl AC A r"TY2 IT 


1 uflh 1 A 1 1 V- J. 




GAGTAATTAG 


AAGTCGCATT 


TAAACGATTG 


GTCAGTCCAA 


CAAACTGCTG 


TACTGCAAAG 


18780 


AGCGACACAG 


AGCTGACAAG 


GATAGAGAAC 


ACCAACAGAA 


AAATAGTAAA 


CTTTTCAGCT 


18840 


TTTTTATAGA 


TAATCAAGAG 


TAGCCCTACC 


AAGGCAACTA 


GTAGGACTAA 


CGCAGTTACC 


18900 


ACTAGATTAA 


GATATCTAAA 


AGCAAGGATA 


TTGTACTTAA 


AGATTAAGAA 


CAATAAAAAA 


18960 


CAAACTAACA 


ATAAATAAAT 


AGTCAGCAAA 


ACTATATTAA 


CACTTCGCTT 


CACTTTCTGT 


19020 


GAACGTGATT 


TTTTAAAACG 


TCTACTCATG 


ATTAATACCT 


ATACATTGAA 


CATTATACGA 


19080 


TTATATCACT 


TTTTTACGGT 


AATGTCTACA 


CCTTTATTTT 


TACTATCTGC 


ATCTTTAAGT 


19140 


ATCTTAGTAG 


ACTTCCCGCG 


AAACAAAAAT 


ATAGTAAAAT 


GAAATAAGAA 


CAGAACAAAT 


19200 


CGTTCAGGAC 


AGTCAAATCG 


ATTTCTAACA 


ATGTTTTAGA 


AGCAGAGGTG 




19250 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21706 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

AAAGTTGAAA GACTGCTAGC TGTTTTTGAT ACCAATCGTT TCCAACTACA GAGCAAACAG 60 

TATACAAAGT TTGTTTTTGG ATGTAAGCTT CTTGATGGAC AATTCCAAGA AAATCAAGAA 120 

ATTGCTGACC TTCAATTTTT TGCCATTGAC CAACTGCCGA ACTTATCTGA AAAACGCATT 180 

ACCAAGGAGC AAATAGAGCT TCTTTGGCAG GTTTATCAAG GTCATAGGGG GCAATATCTT 240 
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GACTAAGAAG 


ATGATTATCG 


TATTTCTAAA 


TCCATTTTTA 


ACAACTAGCA 


TGGTATAATA 


300 


ATATGCAGGA 


AAATTTTGAA 


TTATGAGGAA 


GACTAGATGA 


ATTTATGGGA 


TATTTTCTTT 


360 


ACGACTCAGG 


CAACCGAGCC 


GCCCAAATTT 


GACCTTTTTT 


GGTATGTTAG 


CCTATTTACG 


420 


CTCTTAGCCT 


TAACCTTTTA 


TACAGCCCAT 


CGCTATCGTG 


AAAAGAAGGT 


TTACCAACGA 


480 


TTTTTCCAAA 


TCTTGCAGAC 


TGTTCAGTTA 


ATCCTTCTTT 


ATGGTTGGTA 


CTGGGTCAAT 


540 


CATATGCCAC 


TGTCAGAAAG 


CCTACCCTTT 


TACCATTGCC 


GTATGGCTAT 


GTTTGTGGTA 


600 


CTCTTGCTTC 


CTGGTCAATC 


CAAATATAAA 


CAATACTTTG 


CATTATTGGG 


AACATTTGGG 


660 


ACATTAGCAG 


CCTTTGTTTA 


TCCAGTGCCA 


GATGCTTACC 


CTTTTCCACA 


TATCACCATT 


720 


CTATCCTTTA 


TCTTTGGTCA 


TTTAGCACTC 


TTGGGGAACT 


CTCTAGTTTA 


TCTATTGAGA 


780 


CAGTATAATG 


CGCGATTGCT 


GGATGTGAAG 


GGAATTTTTC 


TCATGACCTT 


TGCCCTAAAT 


840 


GCCTTGATTT 


TTGTGGTCAA 


TTTGGTGACA 


GGTGGCGATT 


ACGGATTTTT 


GACAAAACCG 


900 


CCATTGGTTG 


GGGATCACGG 


TCTAGTAGCT 


AATTATTTAC 


TTGTTTCAAT 


TGTGCTGGTA 


960 


GCTACTATCA 


GTTTGACTAA 


GAAAATCTTA 


GAATTCTTTT 


TAGCTCAAGA 


AGCAGAAAAA 


1020 


ATGATTGCAA 


AGGAAGCTTA 


ACACAGAGCT 


TTCTTTTTTG 


CTCTTAGAGA 


GTTTTTACAA 


1080 


GCAGCTTATA 


AAATAAGAAT 


TTCTGAATAG 


ACAAACTCAA 


AAAATGGCTG 


GGAAATTTAG 


1140 


GAAAAAAGCA 


AGCACGATTA 


AATTTTTTGT 


GTTATAATAT 


TTTGTGAATA 


GCTATGCCTA 


1200 


TGTTTAGCTA 


TGGAATAATA 


CGAAGTGCGA 


AACTTGGAAG 


ATAGAGAGGA 


AGCGATGTAA 


1260 


TGGCTAGAGA 


AGGCTTTTTT 


ACAGGTCTAG 


ATATTGGAAC 


AAGCTCTGTC 


AAGGTGCTTG 


1320 


TGGCCGAGCA 


GAGAAATGGT 


GAATTAAATG 


TAATTGGCGT 


GAGTAATGCC 


AAAAGTAAAG 


1380 


GTGTAAAGGA 


TGGAATTATT 


GTTGATATTG 


ATGCAGCAGC 


AACTGCTATC 


AAGTCAGCCA 


1440 


TTTCCCAAGC 


GGAAGAAAAG 


GCAGGCATTT 


CGATTAAATC 


AGTGAATGTC 


GGCTTGCCTG 


1500 


GTAATCTTTT 


GCAGGTAGAA 


CCAACTCAGG 


GGATGATTCC 


AGTAACATCT 


GATACTAAGG 


1560 


AAATTACGGA 


TCAAGATGTT 


GAAAATGTTG 


TCAAATCAGC 


TTTGACAAAG 


AGTATGACAC 


1620 


CTGACCGTGA AGTCATTACC TTTATTCCTG AAGAATTTAT 


TGTGGATGGT 


TTCCAAGGGA 


1680 


TTCGTGACCC ACGTGGCATG 


ATGGGGGTTC 


GCCTTGAAAT 


GCGTGGTTTG 


CTTTATACAG 


1740 


GACCTCGTAC 


TATCTTGCAC 


AATTTGCGTA 


AGACGGTTGA 


GCGTGCAGGT 


GTTCAGGTTG 


1800 


AAAATGTTAT 


CATTTCACCA 


CTAGCAATGG 


TTCAGTCTGT 


TTTGAACGAA 


GGGGAACGTG 


1860 


AATTTGGTGC 


TACAGTGATT 


GATATGGGGG 


CAGGTCAAAC 


GACTGTCGCT 


ACAATCCGTA 


1920 


ATCAAGAACT 


CCAGTTCACA 


CATATTCTCC 


AAGAAGGTGG 


AGATTATGTA 


ACTAAAGATA 


1980 
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TCTCCAAGGT 


TTTGAAAACC 


TCTCGCAAAT 


TAGCGGAAGG 


CTTGAAACTG 


AATTACGGGG 


2040 


AAGCCTATCC 


GCCTCTTGCA 


AGCAAAGAAA 


CCTTCCAAGT 


AGAGGTTATT 


GGAGAAGTAG 


2100 


AAGCAGTCGA 


AGTGACGGAA 


GCCTACTTGT 


CAGAAATTAT 


TTCTGCACGA 


ATCAAGCACA 


2160 


TCCTTGAACA 


AATCAAGCAA 


GAATTAGATA 


GAAGGCGTCT 


ATTGGACCTC 


CCTGGTGGTA 


2220 


TTGTCTTAAT 


CGGTGGGAAT 


GCCATTTTAC 


CAGGTATGGT 


TGAGCTTGCT 


CAGGAAGTCT 


2280 


TTGGCGTCCG 


TGTCAAGCTT 


TATGTTCCAA 


ATCAAGTTGG 


TATCCGTAAT 


CCAGCCTTTG 


2340 


CGCATGTGAT 


TAGTTTATCA 


GAATTTGCGG 


GTCAATTAAC 


AGAAGTTAAT 


CTTTTGGCTC 


2400 


AGGGAGCGAT 


AAAAGGTGAG 


AATGACTTAA 


GTCATCAGCC 


AATTAGTTTT 


GGTGGGATGC 


2460 


TGCAAAAAAC 


AGCTCAGTTT 


GTACAATCAA 


CGCCTGTTCA 


ACCAGCTCCT 


GCTCCAGAAG 


2520 


TAGAGCCGGT 


GGCGCCTACA 


GAACCAATGG 


CGGATTTCCA 


ACAAGCTTCA 


CAAAATAAAC 


2580 


CGAAATTAGC 


AGATCGTTTC 


CGTGGATTGA 


TCGGAAGCAT 


GTTTGACGAA 


TAAAGAGGAA 


2640 


AAATAAATTA 


TGACATTTTC 


ATTTGATACA 


GCTGCTGCTC 


AAGGGGCAGT 


GATTAAAGTA 


2700 


ATTGGTGTCG 


GTGGAGGTGG 


TGGCAATGCC 


ATCAACCGTA 


TGGTCGACGA 


AGGTGTTACA 


2760 


GGCGTAGAAT 


TTATCGCAGC 


AAACACAGAT 


GTACAAGCAT 


TGAGTAGTAC 


AAAAGCTGAG 


2820 


ACTGTTATTC 


AGTTGGGACC 


TAAATTGACT 


CGTGGTTTGG 


GTGCAGGAGG 


TCAACCTGAG 


2680 


GTTGGTCGTA 


AAGCCGCTGA 


AGAAAGCGAA 


GAAACACTGA 


CGGAAGCTAT 


TAGTGGTGCC 


2940 


GATATGGTCT 


TCATCACTGC 


TGGTATGGGA 


GGAGGCTCTG 


GAACTGGAGC 


TGCTCCTGTT 


3000 


ATTGCTCGTA 


TCGCCAAAGA 


TTTAGGTGCG 


CTTACAGTTG 


GTGTTGTAAC 


ACGTCCCTTT 


3060 


GGTTTTGAAG 


GAAGTAAGCG 


TGGACAATTT 


GCTGTAGAAG 


GAATCAATCA 


ACTTCGTGAG 


3120 


CATGTAGACA 


CTCTATTGAT 


TATCTCAAAC 


AACAATTTGC 


TTGAAATTGT 


TGATAAGAAA 


3180 


ACACCGCTTT 


TGGAGGCTCT 


TAGCGAAGCG 


GATAACGTTC 


TTCGTCAAGG 


TGTTCAAGGG 


3240 


ATTACCGATT 


TGATTACCAA 


TCCAGGATTG 


ATTAACCTTG 


ACTTTGCCGA 


TGTGAAAACG 


3300 


GTAATGGCAA ACAAAGGGAA TGCTCTTATG 


GGTATTGGTA 


TCGGTAGTGG 


AGAAGAACGT 


3360 


GTGGTAGAAG 


CGGCACGTAA 


GGCAATCTAT 


TCACCACTTC 


TTGAAACAAC 


TATTGACGGT 


3420 


GCTGAGGATG 


TTATCGTCAA 


CGTTACTGGT 


GGTCTTGACT 


TAACCTTGAT 


TGAGGCAGAA 


3480 


GAGGCTTCAC 


AAATTGTGAA 


CCAGGCAGCA 


GGTCAAGGAG 


TGAACATCTG 


GCTCGGTACT 


3540 


TCAATTGATG AAAGTATGCG TGATGAAATT CGTGTAACAG 


TTGTTGCAAC 


GGGTGTTCGT 


3600 


CAAGACCGCG TAGAAAAGGT TGTGGCTCCA CAAGCTAGAT CTGCTACTAA CTACCGTGAG 


3660 


ACAGTGAAAC 


CAGCTCATTC 


ACATGGCTTT 


GATCGTCATT 


TTGATATGGC 


AGAAACAGTT 


3720 


GAATTGCCAA AACAAAATCC 


ACGTCGTTTG 


GAACCAACTC 


AGGCATCTGC 


TTTTGGTGAT 


3780 
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TGGGATCTTC GCCGTGAATC GATTGTTCGT ACAACAGATT CAGTCGTTTC TCCAGTCGAG 3840 

CGCTTTGAAG CCCCAATTTC ACAAGATGAA GATGAATTGG ATACACCTCC ATTTTTCAAA 3900 

AATCGTTAAG TAAATGAATG TAAAAGAAAA TACAGAACTT GTTTTTCGAG AAGTTGCAGA 3960 

GGCTAGTCTG AGTGCTCATC GAGAGAGTGG TTCGGTCTCT GTCATTGCAG TTACCAAGTA 4020 

TGTAGATGTA CCGACAGCGG AAGCCTTGCT TCCGCTAGGT GTCCATCATA TCGGTGAAAA 4080 

TCGTGTAGAT AAGTTTCTGG AAAAATATGA AGCTTTAAAA GATCGAGATG TGACTTGGCA 4140 

TTTGATTGGT ACCTTGCAAA GACGTAAGGT GAAAGATGTC ATTCAATACG TTGATTATTT 4200 

CCATGCATTG GACTCAGTAA AGCTAGCAGG GGAAATTCAA AAAAGAAGTG ACCGAGTCAT 4260 

CAAGTGTTTC CTTCAAGTAA ATATTTCTAA AGAAGAAAGC AAACACGGTT TTTCGAGAGA 4320 

GGAACTGCTG GAAATCTTGC CAGAGTTAGC CAGACTAGAT AAGATTGAAT ATGTTGGTTT 4380 

AATGACGATG GCACCTTTTG AGGCTAGCAG TGAGCAGTTG AAAGAGATTT TCAAGGCGGC 4440 

CCAAGATTTA CAAAGAGAAA TTCAAGAGAA ACAAATTCCA AATATGCCTA TGACCGAGTT 4500 

AAGTATGGGA ATGAGTCGTG ATTATAAAGA AGCGATTCAA TTCGGTTCCA CTTTTGTTCG 4 560 

TATAGGTACA TCATTTTTTA AGTAGGAGAG AACCATGTCT TTAAAAGATA GATTCGATAG 4620 

ATTTATAGAT TATTTTACGG AGGATGAGGA TTCAAGTCTC CCTTATGAAA AAAGAGATGA 4 680 

GCCTGTGTTT ACTTCAGTAA ATTCTTCACA GGAACCGGCT CTCCCAATGA ATCAACCTTC 4740 

ACAGTCGGCT GGCACAAAAG AGAACAATAT CACCAGACTT CATGCAAGAC AACAGGAATT 4800 

GGCAAATCAG AGTCAGCGTG CAACGGATAA GGTCATTATA GATGTTCGTT ATCCTAGAAA 4860 

ATATGAGGAT GCAACAGAAA TTGTTGATTT ATTGGCAGGA AACGAAAGTA TCTTGATTGA 4920 

TTTTCAGTAT ATGACAGAGG TGCAGGCTCG TCGTTGTTTG GACTATTTGG ATGGAGCTTG 4980 

TCATGTTTTA GCTGGAAATT TGAAAAAGGT AGCTTCTACC ATGTATTTGT TGACACCAGT 5040 

GAACGTTATT GTAAATGTTG AAGATATCCG TTTACCAGAT GAAGATCAAC AGGGTGAGTT 5100 

CGGTTTTGAT ATGAAGCGAA ATAGAGTACG ATAATGATTT TTTTAATTCG TATGATTTAT 5160 

AATGCAGTGG ATATTTACTC CCTGATTTTG GTAGCCTTCG CTGTCATGTC TTGGTTTCCA 5220 

GGTGCCTACG AATCCAGTTT AGGTCGTTGG ATTGTAGCGT TGGTGAAACC AGTGCTTGCT 5280 

CCCTTGCAAC GCCTGCCTTT ACAGATAGCG GGTCTTGATT TATCTGTTTG GGTTGCGATT 5340 

GTTTTGGTTC GATTTTTAGG AGAAAACCTA GTGCGTTTTC TGGCGATGAT AGGATGAATA 5400 

AAGGGATTTA TCAGCATTTC TCCATAGAAG ATCGTCCATT TCTTGACAAG GGAATGGAAT 5460 

GGATAAAGAA GGTAGAAGAT AGCTATGCTC CTTTTTTAAC TCCTTTTATC AATCCTCATC 5520 
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AGGAGAAGCT ATTAAAGATT TTGGCCAAAA CCTATGGTCT TGCTTGTAGC AGTAGTGGGG 5580 

AATTCGTCTC GAGTGAGTAT GTTCGAGTTT TATTATACCC AG ATT ATT TC CAACCAGAGT 5640 

TTTCAGATTT TGAAATATCT CTCCAGGAAA TTGTGTATTC CAATAAATTT GAACATTTAA 5700 

CGCATGCTAA GATTTTAGGG ACAGTCATCA ATCAATTAGG GATTGAACGG AAACTTTTTG 5760 

GAGATATCCT AGTAGATGAA GAACGGGCGC AGATTATGAT TAATCAGCAG TTTCTTCTTC 5820 

TCTTTCAAGA TGGACTAAAG AAAATTGGTC GTATACCTGT TTCGCTGGAG GAACGTCCTT 5880 

TCACCGAGAA AATAGATAAG CTAGAACAGT ATCGAGAACT GGATTTATCT GTGTCTAGTT 5940 

TTCGATTAGA TGTTCTTTTA TCAAATGTTT TGAAACTATC TAGGAATCAA GCAAACCAGT 6000 

TGATTGAAAA GAAACTTGTC CAAGTAAATT ATCATGTGGT AGACAAATCA GATTACACTG 6060 

TTCAAGTTGG AGACTTGATT AGTGTGAGAA AATTTGGTCG CTTGAGATTA CTTCAAGATA 6120 

AGGGACAAAC GAAAAAAGAG AAGAAAAAAA TAACCGTCCA GTTATTATTA AGTAAGTGAG 6180 

GAATAGAATG CCAATTACAT CATTAGAAAT AAAGGACAAG ACTTTTGGAA CTCGATTCAG 6240 

AGGTTTTGAT CCAGAAGAAG TCGATGAATT TTTAGATATT GTGGTTCGTG ATTACGAAGA 6300 

TCTTGTGCGT GCGAATCATG ATAAAAATTT GCGTATTAAG AGTTTAGAAG AGCGTTTGTC 6360 

TTACTTTGAT GAAATAAAAG ATTCATTGAG CCAGTCTGTA TTGATTGCTC AGGATACAGC 6420 

TGAGAGAGTG AAACAGGCGG CGCATGAACG TTCAAACAAT ATCATTCATC AAGCAGAGCA 6480 

AGATGCGCAA CGCTTGTTGG AAGAAGCTAA ATATAAGGCA AACGAGATTC TTCGTCAAGC 6540 

AACTGATAAT GCTAAGAAAG TCGCTGTTGA AACAGAAGAA TTGAAGAACA AGAGCCGTGT 6600 

CTTCCACCAA CGTCTCAAAT CTACAATTGA GAGTCAGTTG GCTATTGTTG AATCTTCAGA 6660 

TTGGGAAGAT ATTCTCCGTC CAACAGCTAC TTATCTTCAA ACCAGTGATG AAGCCTTTAA 6720 

AGAAGTGGTT AGCGAAGTAC TTGGAGAACC GATTCCAGCT CCAATTGAAG AAGAACCAAT 6780 

TGATATGACA CGTCAGTTCT CTCAAGCAGA AATGGCAGAA TTACAAGCTC GTATTGAGGT 6840 

AGCCGATAAA GAATTGTCTG AATTTGAAGC TCAGATTAAA CAGGAAGTGG AAGCTCCAAC 6900 

TCCTGTAGTG AGTCCTCAAG TTGAAGAAGA GCCTCTGCTC ATCCAGTTGG CCCAATGTAT 6960 

GAAGAACCAG AAGTAGCTCC AATGCATCCG ATAGGTCCAA CACCAGCTAC AGAAACTGTT 7020 

GATTCAATAC CGGGATTTGA AGCACCGCAA GAATCTGTTA CAATTTTATA AGAAATATTC 7080 

TGAGAACAAT ATCTTATCCT TATATTTCCA GCGAGCAGGA GATGGTGTGA GTCCTGTAAT 7140 

CCCTATTGAT AAGATTATCC TCTCAAAAAC TCAAGTCTGA AGCTAGTAAG ATTTGACGTT 7200 

TCCCACGTTA CGGGATAAGA GGGAGAAAGA CTAAATCTTT TTCCGAATAA AGGTGGTACC 7260 

ACGATTTTCG TCCTTTTTGG AAGTCGTGGT TTTTAATTTG TTATTATTTA TAAAGGAGAT 7320 
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ACCATGAAAC TCAAAGACAC CCTTAATCTT GGGAAAACTG AATTCCCAAT GCGTGCAGGC 7380 

CTTCCTACCA AAGAGCCAGT TTGGCAAAAG GAATGGGAAG ATGCAAAACT TTATCAACGT 7440 

CGTCAAGAAT TGAACCAAGG AAAACCTCAT TTCACCTTGC ATGATGGCCC TCCATACGCT 7500 

AACGGAAATA TCCACGTTGG ACATGCTATG AACAAGATTT CAAAAGATAT CATTGTTCGT 7560 

TCTAAGTCTA TGTCAGGATT TTACGCACCA TTTATTCCTG GTTGGGATAC TCATGGTCTG 7620 

CCAATCGAGC AAGTCTTGTC AAAACAAGGT GTCAAACGTA AAGAAATGGA CTTGGTTGAG 7680 

TACTTGAAAC TTTGCCGTGA GTACGCTCTT TCTCAAGTAG ATAAACAACG TGAAGATTTT 7740 

AAACGTTTGG GTGTTTCTGG TGACTGGGAA AATCCATATG TGACCTTGAC TCCTGACTAT 7800 

GAAGCAGCTC AAATTCGTGT ATTTGGTGAG ATGGCTAATA AGGGTTATAT CTACCGTGGT 7860 

GCTAAGCCAG TTTACTGGTC ATGGTCATCT GAGTCAGCAC TTGCTGAAGC AGAGATTGAA 7920 

TACCATGACT TGGTTTCAAC TTCCCTTTAC TATGCCAACA AGGTAAAAGA TGGCAAAGGA 7980 

GTTCTAGATA CAGATACTTA TATCGTTGTC TGGACAACGA CTCCATTTAC CATCACAGCT 8040 

TCTCGTGGTT TGACGGTTGG TGCAGATATT GATTACGTTT TGGTTCAACC TGCTGGTGAA 8100 

GCTCGTAAGT TTGTCGTTGC TGCTGAATTA TTGACTAGCT TGTCTGAGAA ATTTGGCTGG 8160 

GCTGATGTTC AAGTTTTGGA AACTTACCGT GGCCAAGAAC TCAACCACAT CGTAACAGAA 8220 

CACCCATGGG ATACAGCTGT AGAAGAGTTG GTAATTCTTG GTGACCACGT TACGACTGAC 8280 

TCTGGTACAG GTATTGTCCA TACAGCCCCT GGTTTTGGTG AGGACGATTA CAATGTTGGT 8340 

ATTGCTAATA ATCTTGAAGT CGCAGTGACT GTTGATGAAC GTGGTATCAT GATGAAGAAT 8400 

GCTGGTCCTG AATTTGAAGG TCAATTCTAT GAAAAGGTAG TTCCAACTGT TATTGAAAAA 84 60 

CTTGGTAACC TCCTTCTTGC CCAAGAAGAA ATCTCTCACT CATATCCATT TGACTGGCGT 8520 

ACTAAGAAAC CAATCATCTG GCGTGCAGTT CCACAATGGT TTGCCTCAGT TTCTAAATTC 8580 

CGTCAAGAAA TCTTGGACGA AATTGAAAAA GTGAAATTCC ACTCAGAATG GGGTAAAGTC 8640 

CGTCTTTACA ATATGATCCG TGACCGTGGT GACTGGGTTA TCTCTCGTCA ACGTGCTTGG 8700 

GGTGTTCCAC TTCCTATCTT CTACGCTGAA GATGGTACAG CTATCATGGT AGCTGAAACT 8760 

ATTGAACACG TAGCTCAACT TTTTGAAGAA TATGGTTCAA GCATTTGGTG GGAACGTGAT 8820 

GCCAAAGACC TCTTGCCAGA AGGATTTACT CATCCAGGTT CACCAAACGG CGAGTTCAAA 8880 

AAAGAAACTG ATATCATGGA CGTTTGGTTT GACTCAGGTT CATCATGGAA TGGAGTGGTG 8940 

GTAAACCGTC CTGAATTGAC TTACCCAGCC GACCTTTACC TAGAAGGTTC TGACCAATAC 9000 

CGTGGTTGGT TTAACTCATC ACTTATCACA TCTGTTGCCA ACCATGGCGT AGCACCTTAC 9060 
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AAACAAATCT TGTCACAAGG TTTTGCCCTT GATGGTAAAG GTGAGAAGAT GTCTAAATCT 9120 

CTTGGAAATA CTATTGCTCC AAGCGATGTT GAAAAACAAT TCGGTGCTGA AATCTTGCGT 9180 

CTCTGGGTAA CAAGTGTTGA CTCAAGCAAT GACGTGCGTA TCTCTATGGA TATCTTGAGC 9240 

CAAGTTTCTG AAACTTACCG TAAGATTCGT AACACTCTTC GTTTCTTGAT TGCCAATACA 9300 

TCTGACTTTA ACCCAGCTCA AGATACAGTC GCTTACGATG AGCTTCGTTC AGTTGATAAG 9360 

TACATGACGA TTCGCTTTAA CCAGCTTGTC AAGACCATTC GTGATGCCTA TGCAGACTTT 9420 

GAATTCTTGA CGATCTACAA GGCCTTGGTG AACTTTATCA ACGTTGACTT GTCAGCCTTC 9480 

TACCTTGATT TTGCCAAAGA TGTTGTTTAC ATTGAAGGTG CCAAATCACT GGAACGCCGT 9540 

CAAATGCAGA CTGTCTTCTA TGACATTCTT GTCAAAATCA CCAAACTCTT GACACCAATC 9600 

CTTCCTCACA CTGCGGAAGA AATCTGGTCA TATCTTGAGT TTGAAACAGA AGACTTCGTC 9660 

CAATTGTCAG AATTACCAGA AGTTCAAACT TTTGCTAACC AAGAAGAAAT CTTGGATACA 9720 

TGGGCAGCCT TCATGGACTT TCGTGGACAA GCACAAAAAG CCTTGGAAGA AGCTCGTAAT 9780 

GCAAAAGTTA TCGGTAAATC ACTTGAAGCA CACTTGACAG TTTATCCAAA TGAAGTTGTG 9840 

AAAACTCTAC TCGAAGCAGT AAACAGCAAT GTAGCACAAC TTTTGATCGT GTCTGAGTTG 9900 

ACCATCGCAG AAGGACCAGC TCCGGAAGCT GCCCTTAGCT TCGAAGATGT AGCCTTCACA 9960 

GTTGAACGTG CTACTGGTGA AGTATGTGAC CGTTGCCGTC GTATCGACCC AACAACAGCA 10020 

GAACGCAGCT ACCAGGCAGT TATCTGTGAC CACTGTGCAA GCATCGTAGA AGAAAACTTT 10080 

GCGGAAGCAG TCGCAGAAGG ATTTGAAGAG AAATAAGATT GAAAAGTCTA GGCAAAATTC 10140 

AATTTGAGAA GAAAAGACAA CTAATTTTAT AGTCTATTAA ACGCATTGTA TCACGTTTTT 10200 

GAATACCTGA TATGATGCGT TTTTTATTTA TTTTAAAAAT TTGCGAGGTA TGACTTTTTA 10260 

TACTCAACAA GAATCAAAGA GAAACTTAGC AAGCTAACAG TAGTAAGATA AAATAGGAAT 10320 

TTGATATTAG GGATAAGATT GGTAAATAGT GTAATATTTT TACAACAATA AATTTATATA 10380 

GTTATTTCTG GTTTCTGAAA AGTATTATAT TTTATTTCAT ATTATACAAA TTTTTATTIT 10440 

ATAATATCAG AACATACTTT TTTTAAAAGC AAATATGATA CAATTTTATT TGAAAAAAAT 10500 

AAAAAAGGAG ATTTTATTAT AAAATTAAAA AGACTTGCTT TAATTAGTGG TATCGTCGGT 10560 

CTTGTGGGAG GAATTTTACT TCTTATTGGT CCTTTTGTCT TGTTGGGAAT AGCGGTAAAC 10620 

ACAGCTGCTA CAACTCTTAA TGGAGGAGCT ACTGCAGGGG CTTTTTCAGG TGTAGCCTTA 10680 

CTCTTGAATG CCTTGAAGAT TGCAAATCTT GTTCTTGGTA TCATTGCTAT TGTTTACTAT 10740 

AAAGGAGATA AGCGTGTAGG TGCAGCTCCG TCTGTACTAA TGATTGTTTC TGGTGGAGTT 10800 

AGTCTCATTC TATTCCGTTC TTAGGATGGG TTGGGGGGAT TTTTGCTATT ATCGGAGGAT 10860 
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CTCTATTCCT TTCAACATTG AAGAAATTCA AATCAGAAGA ATAAAAGGTA TTTTAGCATG 10920 

AAAAGAACAA AAAAGTTTAT CGGTATAGGA GTAGCTCTAT TATCTCTTTC TCTTCTAGTT 10980 

GCATGTGGAA CATAAAGTTC AAAGAATACT TCAACAAGTA ATGATGAGAA GACAGTAGCA 11040 

ACATCCAATA GTTCAAAAGA AACAATCACT TTCGATACAC CGGTTGTAAC AGACGATGCG 11100 

ATTGAATCAA TACGCACTTA TGCAGATTAT ATAGATCTTT ATAAAAATAT TTTTGATGAT 11160 

TATTTTACTA AAGCTGAGGA AGGTTTCAAA GGCATAGCTA TGGAAAATAA TGACTCGTTT 11220 

ACTAAACTAA AAGAGTCAAC TCAAAAATTA TTCGATGCGC AGAAAAAAAG GTTAAATAAT 11280 

GAAGATAGAA TAGAAACAAC CAAAAACAAT GTGATTGCCA AACATTGTCA AACAGTCCTT 11340 

TCCTTTTTGG TTTTGACTAG CTTTTTTGTG AAAAATTGTG TAAAATAGAA TAGATAAACG 11400 

AGGGGAAACC TCGGAAAATT TAAAGGAGAA TCCATCTAAT GGTAAAATTG GTTTTTGCTC 11460 

GCCACGGTGA GTCTGAATGG AACAAAGCTA ACCTTTTCAC TGGTTGGGCT GATGTTGATT 11520 

TGTCTGAAAA AGGTACACAA CAAGCGATTG ACGCTGGTAA ATTGATCAAA GAAGCTGGTA 11580 

TCGAATTTGA CCAAGCTTAC ACTTCAGTAT TGAAACGTGC TATCAAAACA ACTAACTTGG 11640 

CTCTTGAAGC TTCTGACCAA TTGTGGGTTC CAGTTGAAAA ATCATGGCGC TTGAACGAAC 11700 

GTCACTACGG TGGTTTGACT GGTAAAAACA AAGCTGAAGC TGCTGAACAA TTTGGTGATG 117 60 

AGCAAGTTCA CATCTGGCGT CGTTCATACG ATGTATTGCC TCCAAACATG GACCGTGATG 11820 

ATGAGCACTC AGCTCACACA GACCGTCGTT ACGCTTCACT TGACGACTCA GTTATCCCAG 11880 

ATGCTGAAAA CTTGAAAGTG ACTTTGGAAC GTGCTCTTCC ATTCTGGGAA GATAAAATCG 11940 

CTCCAGCTCT TAAAGATGGT AAAAACGTAT TCGTAGGAGC TCACGGTAAC TCAATCCGTG 12000 

CCCTTGTAAA ACACATCAAA GGTTTGTCAG ATGACGAGAT CATGGACGTG GAAATCCCTA 12060 

ACTTCCCACC ATTGGTATTC GAATTCGACG AAAAATTGAA CGTCGTTTCT GAATACTACC 12120 

TTGGAAAATA AAAAATTGTA AGTCTAGAAT TGATTTCTAG GCTTTTTATG TTAGTATGGA 12180 

AGTATGATAA GGAATAAAAA ACAAGATTAT GTACTGGCCT ACAAGCAACC AGCTTCAACC 12240 

ACTTACATGG GTTGGGAAGA AGAAGCTTTA CCGATAGGCA ATGGTTCTTT AGGAGCAAAA 12300 

GTATTTGGCC TTATAGGGGC TGAACGGATT CAATTTAATG AAAAAAGTCT CTGGTCTGGA 12360 

GGTCCACTTC CTGATAGTTC AGATTATCAG GGTGGAAATC TTCAGGATCA GTATGTTTTT 12420 

TTAGCTGAGA TTCGGCAGGC TTTGGAGAAG AGAGATTACA ATCTGGCTAA GGAACTGGCT 12480 

GAGCAGCACC TAATTGGGCC AAAAACGAGT CAATATGGGA CCTATCTGTC TTTTGGGGAT 12540 

ATTCACATTG AGTTCAGCCA GCAAGGTACG ACTTTGTCTC AGGTGACGGA CTATCAGAGA 12600 



WO 98/18931 



PCT/US97/19588 



366 

CAGCTGAATA TTAGTAAGGC ACTTGCGACG ACTTCTTATG TCTATAAGGG AACGCGATTT 12660 

GAACGTAAAG CTTTTGCGAG TTTTCCAGAT GATCTCTTGG TTCAATGTTT TACTAAGGAA 12720 

GGGTTGGAAA CTCTAGATTT TACTATAGAA CTATCCTTGA CCTGTGATTT GGCTTCTGAT 12780 

GGAAAGTATG AGCAGGAAAA ATCTGATTAC AAGGAGTGTA AGTTGGATAT TACTGATTCT 12840 

CATATCTTGA TGAAGGGAAG AGTTAAGGAT AATGATCTGC GGTTTGCTAG TTATCTAGCT 12900 

TGGGAAACGG ATGGAGATAT TAGAGTTTGG TCAGATAGGG TTCAGATATC AGGAGCCAGT 12960 

TATGCCAATC TCTTCTTGGC CGCTAAGACG GATTTTGCCC AAAATCCTGC TAGCAATTAT 13020 

CGCAAGAAAC TAGATTTAGA GCAACAGGTG ATAGACTTGG TGGACACAGC TAAAGAAAAG 13080 

GGCTATACCC AATTGAAATC AAGGCATATC GAGGACTACC AAGCCTTATT CCAGCGTGTT 13140 

CAATTGGATT TGGAAGCTGA TGTTGACGCA TCCACTACAG ATGATTTGTT AAAAAATTAT 13200 

AAGCCACAAG AAGGGCAGGC TTTGGAGGAG CTGTTCTTCC AGTATGGACG GTATTTATTG 13260 

ATTAGTTCGT CCAGAGACTG CCCAGATGCT CTACCAGCTA ACCTACAGGG AGTCTGGAAT 13320 

GCGGTCGACA ATCCTCCTTG GAATTCGGAC TATCACTTAA ATGTCAATCT GCAGCTGAAT 13380 

TATTGGCCAG CCTATGTTAC CAATCTCCTA GAGACGGTCT TTCCAGTCAT CAACTATGTA 13440 

GATGATTTGC GTGTCTATGG TCGTCTAGCG GCTGTAAAGT ATGCAGGAAT CGTCTCTCAG 13500 

AAAGGTGAGG AGAATGGTTG GTTGGTTCAT ACTCAAGCGA CTCCCTTTGG TTGGACGGCA 13560 

CCTGGTTGGG ATTACTATTG GGGTTGGTCA CCAGCTGCCA ATGCGTGGAT GATGCAAACC 13620 

GTTTATGAAG CCTATTTATT TTATAGGGAC CAAGACTATC TCAGGGAGAA AATTTATCCC 13680 

ATGTTGAGGG AAACGGTTCG TTTTTGGAAT GCCTTTTTAC ATAAGGATCA GCAGGCGCAG 13740 

CGTTGGGTGT CTTCTCCGTC TTATTCCCCA GAACATGGGC CGATTTCGAT TGGCAATACC 13800 

TATGACCAAT CTCTGATTTG GCAGTTATTT CATGATTTTA TTCAGGCTGC TCAGGAATTG 13860 

GGACTGGATG AGGACTTGTT GACTGAGGTT AAGGAGAAGT CTGATTTACT AAATCCTTTG 13920 

CAAATCACTC AATCTGGTCG AATCAGGGAG TGGTATGAGG AGGAAGAGCA GTATTTTCAA 13980 

AATGAGAAAG TGGAGGCCCA GCATCGGCAC GCTTCCCATC TAGTGGGACT CTATCCTGGC 14040 

AATCTCTTTA GCTACAAGGG ACAAGAGTAT ATTGAAGCGG CGCGTGCTAG CCTCAATGAT 14100 

CGTGGAGATG GCGGCACAGG CTGGTCCAAG GCTAATAAGA TCAATCTCTG GGCGCGTTTG 14160 

GGAGATGGCA ATCGAGCCCA TAAATTATTG GCAGAGCAGT TAAAGACATC CACCTTGCAA 14220 

AATCTTTGGT GTAGCCATCC TCCTTTTCAG ATAGATGGTA ATTTTGGTGC TACTAGTGGC 14280 

ATGGCAGAAA TGTTACTCCA GTCTCATGCA GCTTATCTGG TACCTCTAGC TGCCCTACCT 14340 

GATGCTTGGT CAACAGGTTC TGTTTCAGGC TTAATGGCAC GTGGACATTT TGAAGTGAGC 14400 
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ATGAGCTGGG 


AAGATAAAAA 


ACTCTTACAG 


TTGACCATTT 


TATCAAGGAG 


TGGAGGAGAT 


14460 


TTGCGAGTTT 


CTTATCCAGA 


TATTGAGAAG 


AGTGTGATTA 


AAATGAATCA 


AGAAAAAATA 


14520 


AAAGCGAAAT 


GCATGGGGAA 


AGATTGTATT 


TCGGTGGCAA 


CAGCAGAAGG 


TGATCTTGTT 


14580 


CAATTTTATT 


TTTAAGAAGA 


TGTTATAAGG 


CAGTAATTTG 


AAACTGCCTT 


TTAATAAGGA 


14640 


TTTAAGAATA 


TAAGCAGTTT 


TCAACTAGTT 


GAAAAAACGT 


TATAATGATA 


ATAGGAAGTA 


14700 


ATACTCAATG 


AAAATCAAAG 


AGCACAAACT 


AGGAAGCTAG 


CCGCAGGTTG 


CTCAAAACAG 


14760 


TGTTTTGAGG 


TTGCAGATGG 


AAGCTGACGT 


GGTTTGAAGA 


GAGATTTTCG 


AGGAGTATAA 


14820 


TTTGTTTGAT 


AGAGGGTGGG 


TCTGATGGCT 


TATATTGAGA 


TGAAACACTG 


TTACAAGCGT 


14880 


TATCAGGTTG 


GGGACACGGA 


GATTGTGGCC 


AATTGTGATG 


TGAATTTTGA 


GATTGAAAAG 


14940 


GGGGAGCTGG 


TTATTATCCT 


TGGTGCTTCA 


GGTGCAGGCA 


AGTCAACAGT 


TCTTAACCTT 


15000 


CTTGGGGGAA 


TGGATACCAA 


TGATGAAGGG 


GAAATCTGGA 


TTGATGGTGT 


TAATATTGCG 


15060 


GATTATAGTT 


CCCACCAGCG 


CACCAATTAC 


CGTAGAAATG 


ATGTGGGGTT 


TGTTTTTCAG 


15120 


TTTTATAATC 


TAGTTTCTAA 


TCTGACAGCT 


AAGGAAAATG 


TGGAACTGGC 


TTCTGAAATT 


15180 


GTGACAGATG 


CCTTGAATCC 


TGATCAGGCC 


TTGACAGATG 


TAGGTCTGGC 


TCATCGTCTC 


15240 


AATAACTTTC 


CAGCCCAGCT 


TTCTGGAGGG 


GAGCAACAGC 


GAGTCTCCAT 


TGCACGCGCG 


15300 


GTAGCCAAAA 


ATCCTAAAAT 


TCTCCTTTGT 


GATGAACCGA 


CTGGAGCCTT 


GGATTATCAG 


15360 


ACGGGCAAGC 


AGGTTTTGAA 


AATTCTCCAA 


GACATGTCTC 


GTCAAAAGGG 


AGCGACGGTG 


15420 


ATCATCGTGA 


CTCATAATGG 


AGCTTTGGCG 


CCCATTGCTG 


ATCGCGTGAT 


TCAAATGCAC 


15480 


GATGCCAGTG 


TCAAGGATGT 


GGTGCTCAAC 


CAGCATCCTC 


AGGATATTGA 


CAGTTTGGAG 


15540 


TACTAGCATG 


ATCAAGCGAA 


AAACTTATTG 


GAAGGACTTA 


GTTCAGTCCT 


TCACAGGCTC 


15600 


CAAGGGGCGT 


TTTTTATCCA 


TCTTGATCCT 


GATGATGTTG 


GGATCTCTAG 


CCTTAGTAGG 


15660 


CCTCAAAGTA 


ACCAGTCCCA 


ACATGGAGGC 


GACAGCTAAT 


GCTTATTTAA 


CAACTGCTCA 


15720 


AACCTTGGAT 


TTGGCAGTCA 


TGTCTAACTA 


TGGCTTGGAT 


CAAGCAGACC 


AAGAAGAACT 


15780 


AAAACAGACG 


GAGGGCGCAG 


AGGTCGAGTT 


TGGCTATTTG 


ACAGATGTGA 


CTATGGATAA" 


15840 


TGGGCAGGAT 


GCCATTCGGC 


TGTACTCCAA 


ACCAGAGCGA 


ATTTCAACCT 


TTCAGCTAAG 


15900 


AAAGGGACGA 


CTTCCTCAGT 


CAGACAAGGA 


AATCGCTTTG 


GCCACTCATT 


TGCAAGGCCA 


15960 


ATACAGCGTG 


GGACAGGAGA 


TTAGTTTTAA 


AGAAAAAGAA 


GAGGGTCATT 


CCTCTTTAAA 


16020 


AGACCATACT 


TATACCATTA 


CTGGTTTTGT 


GGATTCGGCT 


GAAATCCTCT 


CCCAGCGAGA 


16080 


TATGGGCTAC 


GCAGGAAGTG 


GAAGTGGGAC 


TCTGACAGCC 


TATGGGGTGA 


TTTTACCTAG 


16140 
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TCAATTTGAT CAGAAAGTCT ACAATATAGC TCGTTTGAAA TATCAAGATT TAGCGGGTTT 16200 

AAATGCCTTT TCATCAGCTT ATGAAGAAAA ATCCAAGCAA CATCAAGAAG AGCTTGAACA 16260 

AATTTTATCA GATAATGGCA AGGTACGTCT GCAACTTTTG AAAAAAGAAG GACAAGAGTC 16320 

TCTAGACAAG GGGCAAGAGA CCCTTGACAA GGCTCAGACT AATTTGCAGG AAGGCAAGCG 16380 

TCGTTTAGCA GCTGCTCAAG CTCGTATACA GGCTCAAGAA AGTCAACTAG CCTTGTTTCC 16440 

TCAAGTTCAG AGAGAGCAGG CTAGTGCTCA ACTTACCCAA GCCAAGCAGG AATTGGGCAA 16500 

GGAAGAGGAC AAACTAAAGC AAGCTGAACA AAATCTAGCC CAAGAAAAGG AAAAATTAGA 16560 

AAAACATCAG CAAGTCTTGG ATGATTTGGC GGAGCCAAGG TATCAGGTTT ATAATCGTCA 16620 

GACCATGCCA GGTGGTCAGG GCTATCTTAT GTATAGCAAT GCTTCATCCA GTATTCGAGC 16680 

AGTGGGCAAT ATCTTTCCTG TGGTACTTTA TGCCGTAGCA GCCATGGTGA CCTTTACGAC 16740 

CATGACTCGC TTTGTAGACG AAGAGCGAAC TCATGCAGGG ATTTTTAAGG CCTTGGGTTA 16800 

TCGTAGTAAG GATATTATCG CCAAGTTTCT CCTTTATGGA CTAGTAGCTG GGACTGTCGG 16860 

AACGGCTCTA GGTAGTATAC TTGGTCATTA TTTGCTAGCC AGTGTAATTT CAAGTGTCAT 16920 

TACAAAAGGC ATGGTGGTGG GAGAAACTCA GATTCAGTTC TATTGGACCT ATAGCTTACT 16980 

AGCTTTTGTC TTGAGCTTGT TGGCGAGTGT GTTACCAGCC TATCTGGTGG CTTGGAGGGA 17040 

ACTTCATGAC GAAGCAGCCC AGCTTCTACT TCCTAAACCT CCTGTCAAAG GAGCTAAAAT 17100 

CTTATTGGAG CGTATCGGTT TTATCTGGCG TCGTCTCAGT TTTACTCATA AGGTAACAGC 17160 

CCGCAACATC TTTCGTTATA AGCAGAGAAT GTTGATGACA ATCTTTGGTG TGGCAGGTTC 17220 

TGTAGCTCTG CTCTTTGCAG GTTTGGGAAT CCAATCTTCT GTAGCAGGAG TTCCGTCTAA 17280 

ACAGTTTCAA CAAATCCAAC AGTATCAGAT GCTTGTCTCT GAAAATCCTA GTGCGACCAA 17340 

TCAGGACAAG GTAGAGCTAG CAGAAGTGTT GAAAGGGCAG GAGATACTAG CCTACCAGAA 17400 

AATCTATTCT AAAGCGCTAT ACAAGGATTT CAAAGGCAAA GCTGGTCTTC AAAACATTAC 17460 

TCTTATGATG ATAGAGAAGG AAGATTTGAC TCCCTTTATC CATCTTCAAC ATCATCAGCA 17520 

GGAGCTGACA TTAAAAGATG GCATCGTTAT TACAGCTAAA CTCGCCCAGC TGGCAGGTGT 17580 

CAAGGTTGGG CAGACTTTAG AAATTGAAGG TAAGGAACTA AAGGTCGTTG CTATTACTGA 17640 

GAACTACGTT GGTCACTTTA TTTATATGAG TCAGGCTAGC TATGAGCAAC TTTACGGACA 17700 

GCTACCCCAA GCCAACACTT ATCTGGTCTC ATTAAGGGAT ACCAGTGCAA CTAGTATCGA 17760 

AAGTCAGGCG GGCTTGCTTA TGAATCAATC TGCGGTGTCC AGCGTTGTCC AAAATGCTTC 17820 

AGCCATTCGA CTCTTCGACT CTATCGCTAG CTCACTCAAT CAGACCATGA CCATCTTGGT 17880 

CATCGTATCG GTTCTATTAG CTATTGTCAT CCTTTACAAT CTGACCAATA TCAACGTAGC 17940 
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TGAGAGAATC CGTGAACTCT CCACTATCAA GGTTCTTGGT TTTCATAATA ATGAAGTCAC 18000 

CCTCTACATT TACCGTGAGA CGATTGTGCT GTCCCTTGTG GGAATCGTAC TTGGTCTGAT 18060 

AGCTGGTTTC TATTTACACC AATTTTTGAT TCAAATGATT TCGCCTGCGA CTA TT CT C TT 18120 

TTATCCGCAG GTAGGCTGGG AAGTCTATGT AATCCCAGTG GCAGCAGTAA GCATCATTTT 18180 

GACCTTGCTT GGTTTCTTCG TCAATTATTA TCTGAGAAAG GTTGATATGT TAGAAGCCCT 18240 

GAAATCTGTA GAGTAAGGTA GTTATTTTTA GCTGATTGAA CTTCTATTTA CTAATATTCA 18300 

AAAATCCTCC GTTTCAAAGA GCAGGGAACT CTTTGTGACA GAGGATTTTT TCTATAGGGC 18360 

TTTAGCAGCT GCAATTGCGG CTTCGAAGTT TGGCTCAGAA TTGATATTAT CCACGTATTC 18420 

AACGTAGCGA ATCGTATTGT CAGTATCGAG GACAAAGACT GCGCGTGCTA ATAGGTGCCA 18480 

TTCGTTGATC AAGAGGGCAT AATCGCGCCC GAAAGAATGG TCAAAGTAGT CTGAAAGCAT 18540 

AATGGCATTG TCAAGGCCTT CAGCACCGCA CCAACGTTTT TGAGCAAAAG GTAGGTCCAT 18600 

TGAAACAGTC AATACGACCG TGTTGTCCAG TCCAGCCAAT TCTTCATTAA AACGACGTGT 18660 

TTGAGTTGAG CAGATGCCTG TATCGATAGA AGGAACGACA CTCAAGACTT TTTTCTTGCC 18720 

ATCAAAATCA GCCAGAGATT TTTTAGAAAG ATCTGTTGTA GTAAGAGAAA AATCAAGCGC 18780 

CTTGTCGCCG ACTTGTAGTT GTTTACCTGT AAAGCTCACA GGATTTCCGA GAAAAGTTAC 18840 

CATAGGATAC TCCAATCTTT TTTCTTCCAT TTTAGCTGAA ACAGTCGGAA TTTTCCAATG 18900 

ATTTGACCGG AAATATGGGC ATAGAAAAAA CGCCAGCTCA TGTGAGAATG ACGTTTTTCA 18960 

TAGGTTTATT TTGCCAATCC TTCAGCAATC TTGTCAAGGT TGTATTTCAT CATGCTGTAG 19020 

TAGCTGTCGC CTTCTTTACC TTGTTCTGCG ATAGAGTCAG TAAAGATTTG AGCGTAGATT 19080 

GGGATGTTTG TGTCTTGAGA AACAGTTTTC ATTGGACGGT CATCCACACT TGATTCTACA 19140 

AAGAGTGATG GAACTTTTGT TTGGCGAAGT TTTTCAACCA AGGTCTTGAT TTGTTCAGGA 19200 

GTTCCTTCTT CTTCAGTATT GATTTCCCAG ATGTAAGCAC TTGGGACACC ATAGGCTTTA 192 60 

GAGAAGTATT TGAATGCTCC TTCGCTGGTT ACAATGAGTT TCTTTTCAGC AGGGATCTTA 19320 

TTAAATTTAT CCTTACTTTC TTTATCAAGT TTGTCTAACT TATCAGTATA TTCTTTGAGA 19380 

TTTTTTTCAT AGAATTCTTT ATTGTTAGGG TCTTTGGCGC TCAATTGTTT GGCGATATTT 19440 

TTAGCAAAAA TAATACCGTT TTCAAGGTTA AGCCAAGCGT GTGGGTCTTC TTTTCCTTTT 19500 

TCATTTTGAC CTTCAAGGTA GATAACATCA ACGCCGTCGC TGACTGCGAA GTAGTCTTTG 19560 

TTTTCAGTTT TCTTGGCATT TTCTACCAAT TTTGTAAACC AAGCATTGCC ACCTGTTTCA 19620 

AGGTTGATAC CGTTATAGAA AATCAAATTA GCCTCAGAAG TTTTCTTAAC GTCTTCAGGA 19680 
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AGTGGTTCGT ATTCGTCTGG GTCTTGCCCA ATCGGAACGA TACTATGAAG GTCAATTTTG • 19740 

TCACCAGCAA TATTTTTAGT AATATCAGCG ATGATTGAGT TTGTAGCAAC AACTTTTAGT 19800 

TTTTGACCAG AAGTTGTATC TTTTTTTCCG CTAGCACATG CTACAAGAAT GATTGCAGAA 19860 

AGAAAGAGAA CGAGTAATGT ACCTAATTTT TTCATTAGAT CCTCCAATTT ATTAGGGCTT 19920 

TGCCCCTTAT TTTAACAAAT GTTTATTTTT CAGTTTCAAA TATCGTTGTT TGGGAGCGAT 19980 

AAAGAAGCTA ATGAGAAAGA AACTAGCAGC TGTAAGCACG ATACTAGAAC CTGCCGCAAC 20040 

ATTAAAACTA TAGCCAATAA AGAGTCCCAA AACTGAAGCA GTAGCTCCGA AGGTTGAGGA 20100 

AAGGAAAATC ATACTTTTCA GACTATTAGC ATACAGATAA GCAGTTGCAG CTGGGGTAAT 20160 

CAGCATGGCT ACAATCAGGA TAGTTCCGAC ACTTTGCATG GCTGTCACAG ACACGAGAGT 20220 

CAGGAGTACC ATGAGAAGGT AGTGATAGAA ATTGACAGGC ATTCCCATGG CTTTAGCCAA 20280 

GAGTTCATCA AAGGAAGTTA TCAAGAGTTG CTTGAAGAAA ATCCAGATTA ACAAGAGGAT 20340 

AGCTGCCCCC ACACCCATAG TAATAAACAT ATCCGTATCT TGGACGGCCA GGATATTACC 20400 

AAAAAGGATA TGGAAAAGGT CAGTTGAACT TTTAGCGACA CCAATCAAGA TGATACCGAG 20460 

GGCTAAGAAA GAAGAAAAGG TAATGCCGAT GGCGGTATCG CTTTTGATAA TCGAGTTTCC 20520 

TTTGATGTAG GTAATGATGA TGGCAGCTAG CAATCCAAAG ACAATGGCTC CGATAAAGAA 20580 

GTCAAGGCCC AAGATGAAGG ATAGGGCTAC ACCTGGTAAG ACAGCATGTG AAATGGCATC 20640 

TCCCATGAGT GACATCCCGC GTAGAATAAT GAAACATCCC ACAGCTCCAG CTACAATCCC 20700 

GACGACAATA GCTGTTATCA AGGCATTTTG TAGGAAATGG AATTTTTGCA ATCCATCGAT 20760 

AAATTCTGCA ATCATAGGTC ACCTCCATTG AAAAAGAGTT GATTACCGTA AGCTTCTTTT 20820 

AGATTGGTTT. CGGTAAAAGT TTCTTTTGTT GGACCAAAGG CAATCACTTC TCGATTGACA 20880 

AGTAAGACTT GATCGAAGTA GTGGGGAATC TTGCTGAGGT CGTGGTGAAC GATGAGAACC 20940 

GTCTTCCCAG CTTTTTTCAA ATCTCTCAGC GTATTCATGA TGATTTCCTC ACTGACAGAG 21000 

TCAATCCCAG CAAAGGGTTC ATCCAAGAGG ATATAGTCGG CTTCCTGCAC CAAACATCTG 21060 

GCAATCAAGA CCCGCTGGAA TTGACCTCCA GACAGTTGAC TAATTTGACG TTCAGCGTAG 21120 

TCAGCTAGGC CGACGATTTC AAGGGCCTCT TGCACTTTCT TCCAATGTTT AGCCTTTAAA 21180 

i 

CTTCGAAAGA GAGGAATAGA GGGAAATAGT CCTAACGAGA CGCATTCCTT GACCTTGATG 21240 

GGAAAGTTGT AGTCGATATT GATTTTTTGT TCGACATAGG CAATTCGGTG TAAGGATTTT 21300 

TTAACTTCCT TGTCATCGAG AAATGCCTGA CCTTGATGTG GGATAATTCC CAACATACCT 21360 

TTTAATAGTG TTGATTTCCC AGCGCCGTTT GGACCAATGA TGCCGGTAAT TGTTGGTCCA 21420 

TGGAGCACTA GTGAAATATC CTTAAGTGCC AACGTTTCTT TGTAGGAGAC ACTGAGGTTT 21480 
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TCGATACGTA TCATAAACTT GTATTCCTCC TGTCTCTTAA TATACATTAA AAAAAAAATT 
AAGTCAAGTT AATTTTTGAA AAAATTAAAA TAATAACTGA AAAATAGATT CTAAAGATAA 
CTTTCAGGAT AAATTTCTAA ATTATAAAAC GCATAGTATC AAGTGTAAAA AACTTGGAAT 
TATGCGTTTT ATCATGGAAA GATTTTTTAT AATAGCTAAA AAATAA 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



21540 
21600 
21660 
21706 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



GATCCCCAGG 


AAAAACCGAG 


GTTTTCCCAA 


TCAATCGTTA 


CTGTCATATT 


CCACTCCTTA 


60 


TTCTAAAAAC 


CTATTTCTTA 


TATTCTACAC 


TATTTTTCTA 


AAATAGCAAG 


TATATTTTGT 


120 


AATTTTCAGA 


AAATTTCTCC 


AATAAAAACC 


AACTCTTAGA 


ACTGATTCTT 


CATTTCACTT 


180 


ATTTATCTTC 


AGTAACTACT 


TCCTGAAGAT 


AAGCGTCAAA 


AACTTCTTCA 


TCTGAAATCG 


240 


TGTCAGAAAT 


GAAGCTTCCA 


TTGCTAGTGC 


GTTCTGACAA 


GTTCAAGTCT 


TGCAATCGGC 


300 


TTTCATAGAT 


TGTTCCTTTA 


TTGGATTGGA 


CAAGCAGAGT 


TTGGTCGTTC 


ACATCCACTT 


360 


CCGTACTGAA 


GAAATCGCCA 


ACAAATCCTT 


GCTCTGCAAC 


TGCTCCTGCC 


AAGAAGACAC 


420 


GATGCGGTTT 


GTTTTTCAAC 


TCACGCAAGA 


CTTGTAATCC 


TCGTTTGGCA 


CGGCTGGTTG 


460 


CTAGAATTTC 


CTCAATGGAA 


ACACGTTTCA 


AGCTTCCACG 


CTGGGTCAAG 


AGGTAGAAGG 


540 


ACGAAGTATT 


ACAGATAAAG 


CCAGATTGGA 


GGACATCATC 


TTCTTTCAAA 


TTCATAGCCT 


600 


TGACACCTGC 


TGCCTTAGCA 


CCGACAACCG 


GAACCTCTTC 


GATATTGAAA 


CGCAGGGCAT 


660 


AACCATTTTG 


ACTAACCAAG 


ACAACATCAT 


CTAGTTTAAT 


CGGAGCCACT 


GCTACAATCT 


720 


GATCTGTATC 


GTCTTTGAGC 


TTAGCATACT 


TGACAGACTT 


AGATCTATAG 


GTCCGCCATG 


780 


GAGTGAATTC 


TTTTCGCTCT 


ACCCGTTTGA 


TTTGACCAAG 


GCGAGTCACT 


GCAAAGTAGG 


640 


TTGTCGCATC 


GTCAAACTGA 


TCCAGTACTT 


CCACATAAAG 


GATTTCTTCA 


TTCGTTTCAA 


900 


AGTTTGTGAT 


GGTTTGGCTC 


AGATGCTCTC 


CGATGTCCTT 


CCAACGAATA 


TCTGCCAACT 


960 


CATGGATTGG 


TCTGTAGATG 


ACATTTCCAA 


GACTTGTGAA 


CATCAAGAGG 


TGCTGGGTTG 


1020 


TCTTGGCAGA 


TTGAACAAAA 


ATCAAACGGT 


CATCATCACG 


CTTGCCAATT 


TCTTCCAAGG 


1080 


TGGAAGCCGC 


AAAGGAACGT 


GGACTGGTAC 


GCTTGATGTA 


ACCTGCCTTG 


GTCACGCTGA 


1140 
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CGTAGGTATC TTCCTCAGCG ATAAGACTAG CTGTATCAAT 


CTCAATTGCT 


TTCGCACTGT 


1200 


CTTCTAAAGA 


ACTCAAACGA 


GGAGTTGCAA 


ATTTCTTCTT 


GACCTCACGA 


AGTTCTTTCT 


1260 


TCATGAGATT 


GTACATAGTC 


CTTTCATCAC 


CGATAATAGC 


CGCCAGCATA 


GCAATCTTCT 


1320 


CACGAAGCTC 


TGCTTCTTCT 


TCCTGCAAGA 


CAACCACATC 


GGTATTGGTC 


AAACGGTACA 


1380 


GTTGCAAAGT 


TACGATAGCC 


TCAGCCTGTT 


CTTCCGTAAA 


ATCATAGCTA 


ACTTTGAGGT 


1440 


TTTCCTTGGC 


GTCCGCCTTA 


TTCTCAGAAG 


CACGGATAAG 


AGCAATGACT 


TCATCCAAAA 


1500 


TCGAAATCAC 


ACGAATCAAA 


CCTTCGACGA 


TATGGAGACG 


TTTCTCAGCC 


TTTTCTTTGT 


1560 


CAAAGCGTGA 


ACGCGCCAAA 


ATCACTTCTC 


GACGGTGAGC 


GATATAGCTA 


GACAGGATTG 


1620 


GAACAATCCC 


AACCTGACGA 


GGTGTGAAAT 


TGTCAATCGC 


CACCATATTA 


AAGTTGTAGT 


1680 


TGATTTGTAG 


GTCGGTGTAC 


TTAAATAAGT 


AGTTGAGAAC 


AAGCTCAGTA 


TTAGCGTCTT 


1740 


TCTTAAGTTC 


GATAGCGATA 


CGAAGACCAT 


CACGGTCAGA CTCATCACGA ACCTCAGCAA 


1800 


TCCCAGCTAC 


CTTGTTATTA 


ACACGAACAT 


CATCGATTTT CTTGACTAGA TTGGCCTTAT 


1860 


TGATTTCATA 


AGGAATCTCA ATAATAACGA 


TTTGTTCCTT 


ACCACCTTTT 


AGCTTTTCAA 


1920 


TTTCAGTCTT 


GGAACGAACA 


ACCACGCGCC 


CTTTCCCAGT 


CTCATAAGCT 


TTCTTGATTT 


1980 


CATCACGACC 


CTGAATAATA 


GCCCCTGTAG 


GGAAGTCTGG 


TCCAGGCAAG 


AATTCCATGA 


2040 


GTTTATCAAT 


CTTTGCAGTT 


GGGTGGTCAA 


TCATGTAAAC 


TGCAGCATCT 


ATGACCTCAG 


2100 


CTAAATTATG 


GGGAGGAATG 


TCTGTGGCAT 


AACCAGCCGA 


AATCCCAGTC 


GAACCATTGA 


2160 


CCAAGAGGTT 


TGGAAAGGCT 


GCTGGCAAGA 


CCGTTGGTTC 


TTTCTCCGTA 


TCGTCAAAGT 


2220 


TCCATGCAAA 


AGGAACTGTC 


TTTTTCTCGA 


TATCCTGAAG 


AAGGTAGCCT 


GCAATTTCAG 


2280 


ACAAACGTGC 


CTCAGTATAA 


CGCATAGCCG 


CAGGAGGATC 


TCCGTCCATA 


GAACCGTTAT 


2340 


TACCGTGCAT 


TTCAACTAGA 


ATCTCACGAT 


TTTTCCAGTT 


CTGTGACATA 


CGAACCATGG 


2400 


CATCATAGAT 


AGAAGAATCC 


CCGTGTGGGT 


GGAAATTCCC 


CATGATGTTC 


CCGACTGACT 


2460 


TGGCCGACTT 


ACGGTAGCTC 


TTGTCAAAAG 


TATTGCTATC 


CTTATTCATA 


GAATAAAGAA 


2520 


TACGGCGCTG 


AACCGGCTTC 


AACCCATCAC 


GAATATCTGG 


CAAAGCCCGG 


TCTTGAATAA 


2580 


TGTACTTGGA 


GTAGCGACCA AAGCGCTCTC 


CCATGATGTC 


CTCCAGGGAC 


ATGTTTTGAA 


2640 


TGTTAGACAT 


AAGATACAAA 


GCCCATAAAA 


TACCAAGTGA AAATAGAAAA TTCTTGAAGT 


2700 


AAGCAAACTC 


ACAAGAGAAT 


TTATCTTTTT 


CACACAGTAT 


CTAGGGCGTG 


TTCAACTCCT 


2760 


TTCAAAGAAT 


GTAGAGTAGG 


TTTTTATGCA 


GTAAAAGATA 


TTTTACGGGA 


ATTCCTCCCG 


2820 


TGTTCAGTTA 


CGATAAGTAA 


CCAAACTATC 


CTGTTTGTAT 


TTTTCAATAT 


GAAAATCTGG 


2880 


TTTTCCAAAA 


TTAGTCTTAG TTTGTGTCTT AGCCGCTCCC TTAAGCGCCT CTTTGAGATA 


2940 
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AGCACTCATA GCAGATTCTT CATTAATAAT CCTGCAATTT TTTCAAACCA AGATTTTCAA 3000 

ACTGCTTTTT CACATAGTCA TTCACATCCG ACTCTAATTT CCAGTTTACT AACATATTAT 3060 

TTTCTTTCAT TAAAACACTG TCGTTTCTTC TAGCGTAAAC TTGACATTAT CTTCAATCCA 3120 

TTTACGGCGT GGTTCTACCT TATCTCCCAT GAGAACATTG ACGCGGCGTT CGGCGCGCGC 3180 

TAAATCTTCA ATTGTGACAC GGATGAGGGT ACGTGTTTCT GGGTTCATGG TTGTTTCCCA 3240 

GAGCTGGTCC GCATTCATCT CACCAAGTCC TTTGTATCGT TGGAGGGTAG CGCCTTTACC 3300 

GAACTGTTTA CGGAGTTCTT CTAGTTCTCC GTCCGTCCAA GCGTAGGCCA CTTCTTCTTT 3360 

CTTGCCTTTA CCTTTGGACA TCTTGTAAAG AGGTGGGAGG GCAATATAGA CATGACCTGC 3420 

CTCGACTAGC GGACGCATGT AACGGTAGAA AAATGTCAAG AGCAAGGTCT GGATATGGGC 3480 

ACCGTCGGTA TCCGCATCGG TCATGATAAT GATCTTATCA TAGTTGGCAT CTTCAATAGA 3540 

GAAGTCTGCT CCAACACCCG CACCAATGGT ATAAATCATG GTATTGATCT CTTCATTTTT 3600 

GAGGATATCC GCCATCTTGG CCTTGGCTGT ATTGACAACC TTACCACGAA GAGGTAGAAT 3660 

AGCCTGGAAC TTGCGGTCAC GACCTTGTTT GGCAGAACCA CCGGCAGAGT CCCCCTCAAC 3720 

TAGATAGAGT TCATTCTTAG CAGGATTCTT AGATTGGGCT GGGGTCAATT TCCCAGACAA 3780 

CAAGCCCTTA TCTTTCTTGT TTTTCTTCCC ATTTCGGCTC TCATCACGCG CCTTACGTGC 3840 

TGCTTCACGA GCATCACGGG CCTTGATAGC CTTGCGGATG AGGTTAGAAG CTAATTCCCC 3900 

ATTTTCCATA AGGAAAAAGG TCAACTTATC AGCCACTATT CCATCCACAA CTGGGCGAGC 3960 

TAGGGGGCTT CCTAGTTTAT CCTTGGTCTG TCCTTCAAAC TGCAAGTGTT CTTCAGGAAC 4020 

TAAGATAGAA AGAACGGCCG CTAGTCCCTC ACGATAGTCT GAACCTTCAA GGTTTTTATC 4080 

TTTTTCCTTG AGAAGACCTG TTTTACGTGC ATAGTCATTC ATGACCTTGG TAATGGCAGA 4140 

CTTGAGTCCT GTCTCGTGCG TTCCACCGTC CTTGGTGCGA ACGTTATTGA CAAAAGATAG 4200 

AATGTTATCT GAGAATCCGT CATTGTACTG GAGGGCTACT TCCACTTGAA AACCATTGTC 4260 

TTCCCCTTCA AAGTAAAGAA CTGGCGTCAA GATTTCCTTA TCTTCGTTGA GATAAGAAAC 4320 

AAAATCTTGT ACTCCATTCT CATAGTGGAA CTCAATCGCT TCATTTGTTC GCTTGTCCGT 4380 

TAAAGACAAG GTCACATTTT TCAAGAGAAA GGCTGATTCA TTAAGGCGCT CTGAAATGGT 4440 

ATTGTACTTG AAATCTGTCG TAGAAAATAT AGTCGCGTCA GGCATAAAAG TAACTTTGGT 4500 

GCCTGTTTTA GACTTGGGTG CTGTACCGAT TTTCTTCAAA GTCGTGACAG GTTTTCCACC 4560 

ATTTTCGAAA CGTTGCTTGT AAACTGCGCC ATCACGGGTA ATTTCAACTT CTAACCAGCT 4620 

AGAAAGGGCG TTAACAACGG AAGAACCCAC TCCGTGAAGT CCACCTGATG TCTTATAGCC 4680 
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ACCTTGACCG AATTTCCCTC 


CGGCATGAAG 


AATGGTAAAG 


ATAACCTCAA 


CAGTTGGAAT 


4740 


TCCCATAGCG TGCATACcTG 


TCGGCATCCC 


ACGTCCATGG 


TCTTGAACCG 


TTAGACTACC 


4800 


GTCTTTATTG 


ATAGTTACAT 


CAATACGATC 


ACCAAACCCA 


GACAAGGCTT 


CATCGACTGC 


4860 


ATTATCAACG 


ATTTCCCAAA 


CTAGGTGATG 


AAGACCAGCG 


CCATCGGTCG 


ATCCAATATA 


4920 


CATCCCTGGA 


CGTTTTCGGA 


CCGCATCCAA 


CCCTTCTAGC 


ACCTGAATAG 


CATCATCATT 


4980 


ATAATTGTTA 


ATATTGATTT 


CCTTTTTTGA 


CACAAGGAAC 


CTCCTATTCG 


TTCATCTTTA 


5040 


CTATTCTACA 


GGTTTTCCAA 


GGATTTTGCA 


AAATTTTTCT 


TTCTCCGATG 


TGACAATTTC 


5100 


AGCAGAGATT 


CTCTGCTTTT 


CTTTCCCAAT 


TCATGATATA 


ATAGGAGTAT 


GATTACAATA 


5160 


GTTTTATTAA 


TCCTAGCCTA 


TCTGCTGGGT 


TCGATTCCAT 


CTGGTCTCTG 


GATTGGACAA 


5220 


GTATTCTTTC 


AAATCAATCT 


ACGCGAGCAT 


GGTTCTGGTA 


ACACTGGAAC 


GACCAACACC 


5280 


TTCCGCATTT 


TAGGTAAGAA 


AGCTGGTATG 


GCAACCTTTG 


TGATTGACTT 


TTTCAAAGGA 


5340 


ACCCTAGCAA 


CGCTGCTTCC 


GATTATTTTT 


CATCTACAAG 


GCGTTTCTCC 


TCTCATCTTT 


5400 


GGACTTTTGG 


CTGTTATCGG 


CCATACCTTC 


CCTATCTTTG 


CAGGATTTAA 


AGGTGGTAAG 


5460 


GCTGTCGCAA 


CCAGTGCTGG 


AGTGATTTTC 


GGATTTGCGC 


CTATCTTCTG 


TCTCTACCTT 


5520 


GCGATTATCT 


TCTTTGGAGC 


TCTCTATCTT 


GGCAGTATGA 


TTTCACTGTC 


TAGTGTCACA 


5580 


GCATCGATTG 


CGGCTGTTAT 


CGGGGTTCTG 


CTCTTTCCAC 


TTTTTGGTTT 


TATCCTGAGT 


5640 


AACTATGACT 


CTCTCTTCAT 


t_l>^ inl in 1 v. 


TTAGCACTTG 


K. 1 Au 1 I IvjA 1 


TATCATTCGT 


5700 


CATAAGGACA 


ATATAGCTCG 


TATCAAAAAT 


AAAACTGAAA 


ATTTGGTCCC 


TTGGGGATTG 


5760 


AACCTAACCC 


ATCAAGATCC 


TAAAAAATAA 


AATGCCAGTT 


CTGTACTGCC 


CCCAAACAGT 


5820 


TAGACAAATA 


ATTTATCCAA 


AGGATTTAGT 


TCTGTACTGC 


ACAGGACTAA 


GTCCTTTTAG 


5880 


TTTTACCTTA 


ATTCGTTTGT 


TGTTGTAGTA 


ATCAATATAG 


TCTATAATGG 


CTTGTTCCAA 


5940 


TTGATTAAGT 


GATTTAAATG 


TTTTCTCATA 


GCCATAAAAC 


ATTTCGGATT 


TTAAAATGCC 


6000 


AAAGAAAGAT 


TCCATCCTAC 


CGTTGTCTTG 


GCTGTTGCCC 


TTACGTGACA 


TGGATGCTTG 


6060 


AATTCCCTTA 


CTCTCTAGGA 


ACCGATGATA 


AGAATCGTGT 


TGGTATTGCC 


AGCCTTGGTC 


6120 


ACTATGGAGA 


ATCGTATTCT 


CGTAGTGCTT 


CTCTGTGAAT 


GCCTGTTCCA 


A 


6171 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18475 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

TATTACAAAT AAAAAAACGG AGGAGTGCTT TATGAAAGCC TATACTTATG TTAAACCAGG 60 

ACTTGCTTCT TTTGTTGATG TAGACAAACC AGTTATTCGC AAGCCAACAG ACGCTATTGT 120 

GCGTATTGTA AAAACCACTA TTTGTGGAAC AGACCTCCAT ATTATCAAAG GGGATGTTCC 180 

TACTTGCCAA AGTGGTACCA TTCTTGGCCA CGAAGGGATT GGGATTGTTG AAGAAGTTGG 240 

GGAAGGAGTT TCCAACTTCA AAAAAGGTGA CAAGGTCTTG ATTTCTTGCG TCTGTGCCTG 300 

TGGTAAATGC TACTACTGTA AAAAAGGAAT TTATGCTCAC TGTGAAGACG AAGGGGGCTG 360 

GATTTTCGGT CACTTGATTG ATGGTATGCA GGCTGAATAT CTACGTGTCC CTCATGCAGA 420 

TAATACTCTT TACCATACTC CAGAAGACTT GTCAGATGAA GCTTTGGTTA TGCTGTCAGA 480 

CATTCTGCCT ACTGGATATG AAATTGGTGT CTTAAAAGGG AAAGTAGAAC CTGGTTGCAG 540 

CGTAGCCATT ATTGGTTCAG GTCCAGTTGG ATTGGCTGCT CTTTTAACAG CCCAATTCTA 600 

TTCACCAGCT AAATTGATTA TGGTAGACCT AGACGATAAC CGCTTGGAAA CTGCCCTATC 660 

ATTCGGTGCG ACTCATAAGG TTAATTCTTC AGACCCTGAA AAAGCCATTA AAGAAATTTA 720 

TGATTTGACA GATGGTCGTG GTGTGGATGT CGCTATCGAA GCTGTTGGTA TTCCTGCAAC 780 

ATTTGATTTC TGTCAAAAGA TTATCGGTGT AGACGGAACG GTTGCCAACT GTGGTGTGCA 840 

TGGTAAACCA GTTGAATTCG ATTTAGATAA ACTTTGGATT CGCAACATCA ATGTAACAAC 900 

TGGTTTGGTA TCTACAAATA CGACTCCACA ATTGTTGAAA GCACTTGAAA GTCATAAGAT 960 

TGAACCGGAA AAATTGGTAA CTCACTATTT CAAACTCAGT GAAATTGAAA AAGCCTACGA 1020 

AGTCTTCAGT AAGGCAGCAG ACCACCATGC CATTAAGGTC ATTATCGAAA ACGATATCTC 1080 

AGAAGCCTAA GTAGTAAAAA TATTTTTGTA CATAAGTAAA TAGAAATTCA GTCATCCATC 1140 

AGATGGCTGG ATTTTTTATC AAAAAATTAA GAAATGAGCA TATTTCTTTC CTTGTCTGGC 1200 

GGAATTGGTT ATAATATACG GTACAAAGGA ATGAATGAAT ATGTATCGTG TTATAGAAAT 1260 

GTACGGAGAT TTTGAACCGT GGTGGTTCTT AGAAGGTTGG GAAGAAGATA TTGTAGCAAG 1320 

TAGAAAATTT GACCAGTATT ATGATGCTCT CAAATACTAC AAAACTTGCT GGTTTAGATT 1380 

GGAACAAGAA TCGCCTCTTT ATAAAAGTAG AAGCGACTTG ATGACCATTT TTTGGGACCC 1440 

GGAAGACCAA CGCTGGTGTG ATGAATGTGA TGAGTATTTA CAACAATACC ATTCTTTGGC 1500 

TCTTTTGCAG GATGAGCAGG TTATCCCAGA CGAAAAACTA CGCTCAGGCT ATGAAAAACA 1560 

AACCAGTCAG GAAAGGAATC GTTCTTGCCG TATGAAATTA AAATAGAGAA AAGTAACTTT 1620 

TTTGGAGTTG CTTTTTTTAT TTTTCTAACT CTTTGCGAAT AGTATAGGTG AGGAGGTAAG 1680 
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TATGGTTCAA GAAATTGCAC AAGAAATCAT TCGTTCAGCT CGGAAAAAAG GGACGCAGGA 1740 

TATCTATTTT GTCCCTAAGT TAGACGCCTA TGAGCTTCAT ATGAGGGTAG GAGACGAGCG 1800 

CTGTAAAATT GGTAGCTATG ATTTTGAAAA GTTTGCAGCC GTTATCAGTC ACTTTAAGTT 1860 

TGTGGCGGGT ATGAATGTGG GAGAAAAAAG ACGTAGTCAA CTGGGTTCCT GTGATTATGC 1920 

CTATGACCAT AAGATAGCGT CTCTACGTTT ATCTACTGTA GGCGATTATC GGGGGCATGA 1980 

GAGTTTGGTT ATCCGTTTGT TGCACGATGA GGAGCAGGAC CTGCATTTTT GGTTTCAGGA 2040 

TATTGAAGAA TTAGGCAAGC AGTACAGGCA ACGGGGACTC TATCTTTTTG CTGGTCCGGT 2100 

TGGGAGTGGT AAGACGACCT TGATGCATGA ATTGTCCAAG TCACTCTTTA AAGGACAGCA 2160 

AGTTATGTCC ATCGAAGATC CTGTCGAAAT CAAGCAGGAC GACATGCTTC AGTTGCAGTT 2220 

GAACGAAGCA ATCGGCCTAA CCTATGAAAA TCTAATCAAA CTTTCCTTGC GTCATCGACC 2280 

AGATCTCTTG ATTATCGGAG AAATTCGTGA CAGCGAGACG GCGCGTGCAG TGGTCAGAGC 2340 

TAGTTTGACA GGTGCGACAG TCTTTTCAAC CATTCACGCC AAGAGTATCC GAGGTGTTTA 2400 

TGAGCGTCTG CTGGAGTTGG GTGTGAGTGA AGAAGAATTG GCAGTTGTTC TGCAAGGAGT 2460 

CTGCTACCAG AGATTAATCG GGGGAGGAGG AATCGTTGAC TTTGCAAGCA GAGATTATCA 2520 

AGAACACCAA GCAGCCAAGT GGAATGAGCA AATTGACCAG CTTCTTAAAG ATGGACATAT 2580 

CACAAGTCTT CAGGCTGAGA CGGAAAAAAT TAGCTACAGC TAAGCAAAAA AATATCATCA 2640 

CCCTATTTAA CAATCTCTTT TCTAGCGGTT TTCATCTGGT GGAGACTATC TCCTTTTTAG 2700 

ATAGGAGTGC TTTGTTGGAC AAGCAGTGTG TGACCCAGAT GCGTGTGGGC TTGTCTCAGG 2760 

GGAAATCATT CTCAGAAATG ATGGAAAGTT TGGGATGTTC AAGTGCTATT GTCACTCAGT 2820 

TATCCCTAGC TGAAGTTCAT GGCAATCTCC ACCTGAGTTT GGGAAAGATA GAAGAATATC 2880 

TGGACAATCT GGCTAAGGTC AAGAAAAAAT TGATTGAAGT AGCGACCTAT CCCTTGATTT 2940 

TGCTGGGTTT TCTTCTCTTA ATTATGCTGG GGCTACGGAA TTACCTGCTC CCACAACTGG 3000 

ATAGTAGCAA TATTGCCACC CAAATTATCG GTAATCTGCC CCAAATTTTT CTAGGCATGG 3060 

TAGGGCTTGT TTCCGTGCTT GCCCTTTTAG CACTCACTTT TTATAAAAGA AGTTCTAAGA 3120 

TGAGTGTCTT TTCTATCTTA GCACGCCTTC CCTTTATTGG AATCTTTGTG CAGACCTACT 3180 

TGACAGCCTA TTATGCACGT GAATGGGGGA ATATGATTTC ACAGGGAATG GAGTTGACGC 3240 

AGATTTTTCA AATGATGCAG GAACAAGGTT CCCAGCTCTT TAAAGAAGTC GGTCAAGATC 3300 

TGGCTCAAAC CCTGAAAAAT GGCCGTGAAT TTTCTCAGAC GATAGGAACC TATCCTTTCT 3360 

TTAGGAAGGA ATTGAGTCTC ATCATAGAGT ATGGGGAAGT TAAGTCCAAG CTGGGTAGTG 3420 

AGTTGGAAAT CTATGCTGAA AAAACTTGGG AAGCCTTTTT TACCCGAGTC AACCGCACCA 3480 
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TGAATTTGGT GCAGCCACTG GTTTTTATCT TTGTGGCACT GATTATCGTT TTACTTTATG 3540 

CGGCAATGCT CATGCCCATG TATCAAAATA TGGAGGTAAA TTTTTAAAAT GAAAAAAATG 3600 

ATGACATTCT TGAAAAAAGC TAAGGTTAAA GCTTTTACAT TGGTGGAGAT GTTGGTGGTC 3660 

TTGCTGATTA TCAGCGTGCT TTTCTTGCTC TTTGTACCTA ATCTGACCAA GCAAAAAGAA 3720 

GCAGTCAATG ACAAAGGAAA AGCAGCTGTT GTTAAGGTGG TGGAAAGCCA GGCAGAACTT 3780 

TATAGCTTAG AAAAGAATGA AGATGCTAGC CTAAGAAAGT TACAAGCAGA TGGACGCATC 3840 

ACGGAAGAAC AGGCTAAAGC TTATAAAGAA TACAATGATA AAAATGGAGG AGCAAATCGT 3900 

AAAGTCAATG ATTAAGGCCT TTACCATGCT GGAAAGTCTC TTGGTTTTGG GACTTGTGAG 3960 

TATCCTTGCC TTGGGCTTGT CCGGCTCTGT CCAGTCCACT TTTTCAGCGG TAGAGGAACA 4020 

GATTTTCTTT ATGGAGTTTG AAGAACTCTA TCGGGAAACC CAAAAACGCA GTGTAGCCAG 4080 

TCAGCAAAAG ACTAGTCTGA ACTTAGATGG GCAGACGCTT AGCAATGGCA GTCAAAAGTT 4140 

GCCAGTCCCT AAAGGAATTC AGGCCCCATC AGGCCAAAGT ATTACATTTG ACCGAGCTGG 4200 

GGGCAATTCG TCCCTGGCTA AGGTTGAATT TCAGACCAGT AAAGGAGCGA TTCGCTATCA 4260 

ATTATATCTA GGAAATGGAA AAATTAAACG CATTAAGGAA ACAAAAAATT AGGGCAGTGA 4320 

TTTTACTGGA AGCAGTAGTC GCTCTAGCTA TCTTTGCCAG CATTGCGACC CTCCTTTTGG 4380 

GACAAATTCA AAAAAATAGG CAAGAGGAAG CAAAAATCTT GCAAAAGGAA GAAGTCTTGA 4440 

GGGTAGCTAA GATGGCCCTG CAGACGGGGC AAAATCAGGT AAGCATCAAC GGAGTTGAGA 4500 

TTCAGGTATT TTCTAGTGAA AAAGGATTGG AGGTCTACCA TGGTTCAGAA CAGTTGTTGG 4560 

CAATCAAAGA GCCATAAGGT CAAGGCTTTT ACCTTGTTAG AATCCCTGCT TGCCCTCATT 4620 

GTCATCAGTG GGGGATTACT CCTTTTTCAA GCTATGAGTC AGCTCCTCAT TTCAGAAGTT 4680 

CGCTACCAGC AACAAAGCGA GCAAAAGGAG TGGCTCTTGT TTGTGGACCA ACTTGAGGTA 4740 

GAATTAGACC GTTCGCAGTT CGAAAAAGTA GAAGGCAATC GCCTATACAT GAAGCAAGAT 4800 

GGCAAGGACA TCGCCATCGG TAAGTCAAAG TCAGATGATT TCCGTAAAAC GAATGCTCGT 4860 

GGTCGAGGTT ATCAGCCTAT GGTTTATGGA CTCAAATCTG TACGGATTAC AGAGGACAAT 4920 

CAACTGGTTC GCTTTCATTT CCAGTTCCAA AAAGGCTTAG AAAGGGAGTT CATCTATCGT 4980 

GTGGAAAAAG AAAAAAGTTA AGGCAGGTGT TCTCCTCTAC GCAGTCACCA TAGCAGCCAT 5040 

CTTTAGTCTT TTGTTGCAAT TTTATTTGAA CCGACAAGTC GCCCACTATC AAGACTATGC 5100 

TTTGAATAAA GAAAAATTGG TTGCTTTTGC TATGGCTAAA CGAACCAAAG ATAAGGTTGA 5160 

GCAAGAAAGT GGGGAACAGT TTTTTAATCT AGGTCAGGTA AGCTATCAAA ACAAGAAAAC 5220 
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TGGCTTAGTG ACGAGGGTTC GTACGGATAA GAGCCAATAT GAGTTTCTGT TTCCTTCAGT 5280 

CAAAATCAAA GAAGAGAAAA GAGATAAAAA GGAAGAGGTA GCGACCGATT CAAGCGAAAA 5340 

AGTGGAGAAG AAAAAATCAG AAGAGAAGCC TGAAAAGAAA GAGAATTCAT AGTCAATTCA 5400 

ACTATAATGC GTTGAATCCA GAATAGTCCA CTGTAGTTTC TAGAAAATTG CTGGAAATGG 5460 

ATGTTAAGCT CCAATTCATT TGTTTATATC TTATTTCAGT TTACTATACT TTGTGCTAAA 5520 

TTAAAGATAT GAAACATGAT TTTAACCACA AAGCAGAAAC TTTCGATTCC CCTAAAAATA 5580 

TCTTCCTCGC AAACTTGGTA TGTCAAGCAG CCGAGAAACA GATTGATCTT CTATCAGACA 5640 

AAGAAATTTT AGATTTCGGT GGTGGCACGG GTCTATTAGC CTTGCCCCTA ACCCCTAGCC 5700 

AAGCAGGCTA AGTCAGTCAC TCTTGTAGAC ATTTCTGAGA AAATGTTGGA GCAAGCTCGT 5760 

TTGAAAGTGG AGCAGCAAGC AATCAAGAAT ATCCAGTTTT TGGAGCAAGA TTTACCGAAA 5820 

AATCCCTTGG AGAAAGAGTT TGATTGCCTT GCTGTTAGTC GGGTTCTTCA TCATATGCCT 5880 

GATTTGGATG CGGCTCTCTC ACTGTTTCAT CAACATTTGA AGGAAGATGG GAAACTCATC 5940 

ATTGCTGATT TTACCAAGAC AGAAGCTAAT CATCATGGAT TTGATTTAGC TGAACTGGAA 6000 

AACAAGCTAA TTGAGCATGG TTTTTCATCT GTGCATAGTC AGATTCTCTA TAGTGCTGAA 6060 

GACCTGTTTC AAGGAAATCA CTCAGAATTC TTTTTAATAG TAGCCCAAAA ATCACTCGCC 6120 

TAGTCAGGGA GTGATTTTTC TATAAGGATG GAAAAAAGAA GGGAAATTTG GTAAGATAGG 6180 

AATATGGATT TTGAAAAAAT TGAACAAGCT TATACCTATT TACTAGAGAA TGTCCAAGTC 6240 

ATCCAAAGTG ATTTGGCGAC CAACTTTTAT GACGCCTTGG TGGAGCAAAA TAGCATCTAT 6300 

CTGGATGGTG AAACTGAGCT AAACCAGGTC AAGGAGAACA ATCAAACCCT TAAGCGTTTA 6360 

GCACTACGCA AAGAAGAATG GCTCAAGACC TACCAGTTTC TCTTGATGAA GGCTGGGCAA 6420 

ACAGAACCCT TGCAGGCCAA TCACCAGTTT ACACCGGATG CTATTGCTTT GCTTTTGGTG 6480 

TTTATTGTGG AAGAGTTGTT TAAAGAGGAG GAAATTACTA TCCTCGAAAT GGGTTCTGGG 6540 

ATGGGAATTC TAGGCGCTAT TTTCTTGACC TCGCTTACTA AAAAGGTGGA TTACTTGGGA 6600 

ATGGAAGTGG ATGATTTGCT GATTGATCTG GCAGCTAGCA TGGCAGATGT AATTGGTTTG 6660 

CAGGCTGGCT TTGTCCAAGG AGATGCCGTT CGCCCACAAA TGCTCAAAGA AAGCGATGTG 6720 

GTCATCAGTG ACTTGCCTGT CGGCTATTAT CCTGATGATG CCGTTGCGTC GCGCCATCAA 6780 

GTTGCTTCTA GCCAAGAACA TACTTACGCC CATCACTTGC TCATGGAACA AGGGCTTAAG 6840 

TACCTCAAGT CAGACGGATA CGCTATTTTT CTAGCTCCGA GTGATTTGTT GACCAGTCCT 6900 

CAAAGTGATT TGTTAAAAGA ATGGCTGAAA GAAGAGGCGA GTCTGGTTGC TATGATTAGT 6960 

CTGCCTGAAA ATCTCTTTGC TAATGCCAAA CAATCTAAGA CTATTTTTAT CTTACAGAAG 7020 
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AAAAATGAAA 


TAGCAGTAGA 


GCCTTTTGTT 


TATCCACTTG 


CTAGCTTGCA 


AGATGCAAGT 


7080 


GTTTTAATGA 


AATTTAAAGA 


AAATTTTCAA 


AAATGGACTC 


AAGGTACTGA 


AATATAAAAT 


. 7140 


AGATTTTGTT 


ATAATAGTTG 


AAAACGCTTA 


AAAAGGGGTA 


TCATGTTATG 


ACAAAAACAA 


7200 


TTGCAATCAA 


TGCAGGAAGT 


TCAAGTTTGA 


AATGGCAATT 


ATACTTAATG 


CCAGAAGAAA 


7260 


AAGTATTGGC 


GAAAGGTTTG 


ATTGAACGTA 


TCGGTTTGAA 


AGATTCAATT 


TCAACTGTAA 


7320 


AATTTGACGG 


CCGTTCTGAA 


CAACAAATTT 


TGGATATTGA 


AAATCATATA 


CAAGCCGTTA 


7380 


AAATTTTATT 


GGATGACTTG 


ATTCGTTTCG 


ATATTATCAA 


GGCTTATGAC 


GAGATTACAG 


7440 


GTGTTGGACA 


TCGTGTTGTT 


GCTGGTGGAG 


AATATTTCAA 


AGAATCAACA 


GTTGTTGAGG 


7500 


GAGATGTTTT 


AGAAAAAGTT 


GAAGAGTTGA 


GTTTGTTGGC 


TCCTCTACAC 


AACCCGGCCA 


7560 


ATGCAGCAGG 


TGTTCGTGCC 


TTCAAGGAAT 


TGTTGCCAGA 


CATTACCAGT 


GTAGTTGTTT 


7620 


TTGATACTTC 


CTTCCACACA 


AGTATGCCAG 


AGAAAGCTTA 


TCGCTACCCT 


CTACCAACAA 


7680 


AATATTACAC 


AGAAAACAAG 


GTTCGTAAAT 


ACGGTGCTCA 


TGGTACAAGT 


CACCAGTTTG 


7740 


TAGCAGGAGA 


AGCTGCAAAA 


CTCTTGGGAC 


GTCCATTAGA 


AGACTTGAAG 


TTAATTACCT 


7800 


GTCATATTGG 


TAACGGAGGC 


TCAATTACAG 


CTGTGAAAGC 


CGGCAAATCT 


GTAGACACTT 


7860 


CTATGGGGTT 


CACTCCTCTT 


GGTGGTATTA 


TGATGGGAAC 


GCGTACAGGG 


GATATTGATC 


7920 


CAGCTATCAT 


TCCTTATTTA 


ATGCAATATA 


CAGAGGATTT 


TAACACACCA 


GAAGATATCA 


7980 


GTCGTGTTCT 


TAACCGTGAA 


TCAGGTCTTT 


TGGGAGTTTC 


TGCTAATTCT 


AGCGATATGC 


8040 


GCGATATAGA 


AGCAGCTGTA 


GCAGAAGGGA 


ATCACGAGGC 


TAGCTTGGCT 


TATGAAATGT 


8100 


ATGTTGACCG 


TATCCAAAAA 


CATATCGGTC 


AGTACCTTGC 


AGTGCTAAAT 


GGAGCAGATG 


8160 


CCATTGTTTT 


CACAGCAGGT 


GTCGGTGAAA 


ATGCAGAGAG 


TTTCCGTCGT 


GATGTAATCT 


8220 


CAGGGATTTC 


GTGGTTTGGT 


TGTGATGTTG 


ATGATGAAAA 


GAATGTCTTT 


GGCGTTACAG 


8280 


GAGACATCTC 


AACAGAGGCA 


GCTAAAATCC 


GTGTCTTGGT 


TATTCCAACA 


GATGAAGAAT 


8340 


TAGTCATTGC 


CCGTGACGTT 


GAACGCTTGA 


AAAAATAAGT 


GAAACTAAAA 


AAATATTCAA 


8400 


TACAAGGAGT 


TGGGAAAGTT 


ATTTTTCCAG 


CTTCTTTTTC 


TGATGAAATT 


GTCCAAAACC 


8460 


TTGCTATGAT 


TGGCTTTTTT 


GAAAAATATG 


GTATAATAGT 


AGTAATTTAA 


TAGATGGAGT 


8520 


TGAGTTTTGA 


AGAAAAACTT 


TCGTGTAAAA 


AGAGAGAAAG 


ATTTTAAGGC 


GATTTTCAAG 


8580 


GAGGGGACAA 


GTTTTGCTAA 


TCGCAAATTT 


GTGGTCTACC 


AATTAGAAAA 


CCAGAAAAAC 


8640 


CGTTTTCGAG 


TAGGTCTATC 


AGTTAGCAAA 


AAACTGGGGA 


ATGCCGTCAC 


TAGAAATCAA 


8700 


ATTAAGCGAC 


GGATTCGGCA 


TATTATCCAG 


AATGCAAAAG 


GGAGTCTGGT 


AGAAGATGTC 


8760 
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GACTTTGTTG 


TCATTGCTCG 


AAAAGGAGTC 


GAAACCTTGG 


GATACGCAGA 


GATGGAGAAA 


8820 


AATCTACTCC 


ATGTATTAAA 


ATTATCAAAG 


ATTTACCGGG 


AAGGAAATGG 


GAGTGAAAAA 


8880 


GAAACTAAAG 


TTGACTAGTT 


TGCTAGGACT 


GTCTCTGTTA 


ATCATGACAG 


CCTGTGCGAC 


8940 


TAATGGGGTA 


ACTAGCGATA 


TTACAGCCGA 


ATCGGCTGAT 


TTTTGGAGTA 


AATTGGTTTA 


9000 


CTTCTTTGCG 


GAAATCATTC 


GCTTTTTATC 


GTTTG AT ATT 


AGTATCGGAG 


TGGGGATTAT 


9060 


TCTCTTTACG 


GTCTTGATTC 


GTACAGTCCT 


CTTGCCAGTC 


TTTCAGGTGC 


AAATGGTGGC 


9120 


TTCTAGGAAA 


ATGCAGGAAG 


CTCAGCCACG 


CATTAAGGCG 


CTTCGAGAAC 


AATATCCAGG 


9180 


TCGAGATATG 


GAAAGCAGAA 


CCAAACTAGA 


GCAGGAAATG 


CGTAAAGTAT 


TTAAAGAAAT 


9240 


GGGTGTCAGA 


CAGTCAGACT 


CTCTTTGGCC 


GATTTTGATT 


CAGATGCCGG 


TTATTTTGGC 


9300 


CCTGTTCCAA 


GCCCTATCAA 


GAGTTGACTT 


TTTAAAGACA 


GGTCATTTCT 


TATGGATTAA 


9360 


CCTTGGTAGT 


GTGGATACAA 


CCCTTGTTCT 


TCCGATTTTA 


GCAGCAGTAT 


TCACCTTTTT 


9420 


AAGTACTTGG 


TTGTCCAACA 


AAGCTTTGTC 


TGAGCGAAAT 


GGCGCTACGA 


CTGCGATGAT 


9480 


GTATGGGATT 


CCAGTCTTGA 


TTTTTATCTT 


TGCAGTTTAT 


GCGCCAGGTG 


GAGTCGCCCT 


9540 


ATACTGGACA 


GTGTCTAATG 


CTTATCAAGT 


CTTGCAAACC 


TATTTCTTGA 


ATAATCCATT 


9600 


CAAGATTATC 


GCAGAGCGCG 


AGGCCGTAGT 


ACAGGCACAA 


AAAGATTTGG 


AAAATAGAAA 


9660 


AAGAAAAGCC 


AAGAAAAAGG 


CTCAGAAAAC 


GAAATAAATA 


AGGAGGAATC 


TGGTAGTGGT 


9720 


AGTATTTACA 


GGTTCAACTG 


TTGAAGAAGC 


AATCCAGAAA 


GGATTGAAAG 


AATTAGATAT 


9780 


TCCAAGAATG 


AAGGCTCATA 


TCAAAGTCAT 


TTCTAGGGAG 


AAAAAAGGCT 


TTCTTGGTCT 


9840 


ATTTGGTAAA 


AAACCAGCCC 


AAGTGGATAT 


TGAAGCGATT 


AGTGAAACGA 


CTGTTGTCAA 


9900 


AGCAAATCAA 


CAGGTAGTAA 


AAGGCGTTCC 


GAAAAAAATC 


AATGATTTGA 


ACGAGCCTGT 


9960 


GAAGACGGTT 


AGTGAAGAAA 


CCGTTGACCT 


TGGTCATGTG 


GTTGATGCTA 


TTAAAAAAAT 


10020 


ag Ann A AH A A 


oo li.nnu\j 1 A 


1111* 1\JA 1 


J\\a 1 v. AAWjL I 


u AAA lull AA 


AACATGAAAG 


10080 


ACATGCCAGC 


ACTATCTTAG 


AAGAAACTGG 


TCACATTGAG 


ATTTTAAATG 


AACTTCAAAT 


10140 


CGAGGAAGCG 


ATGAGGGAAG 


AAGCAGGCGC 


TGATGACCTT 


GAAACTGAGC 


AAGACCAAGC 


10200 


TGAAAGTCAA 


GAACTAGAAG 


ACTTGGGCTT 


GAAAGTTGAA 


ACGAACTTTG 


ATATTGAACA 


10260 


AGTAGCTACG 


GAAGTAATGG 


CTTATGTTCA 


AACGATTATT 


GATGACATGG 


ATGTTGAGGC 


10320 


TACACTTTCA 


AATGATTATA 


ACCGTCGTAG 


CATCAATCTA 


CAAATTGACA 


CCAACGAACC 


10380 


AGGTCGTATT 


ATCGGCTACC 


ATGGTAAAGT 


CTTGAAGGCC 


TTGCAACTGT 


TGGCTCAAAA 


10440 


TTATCTTTAC 


AACCGCTATT 


CCAGAACCTT 


CTACGTTACA 


ATCAATGTCA 


ATGATTATGT 


10500 


CGAACACCGT 


GCAGAAGTCT 


TGCAGACCTA 


TGCGCAAAAA 


TTGGCGACTC 


GTGTTTTGGA 


10560 



WO 98/18931 



PCTYUS97/19588. 



381 

AGAAGGGCGC AGTCATAAAA CAGATCCAAT GTCAAATAGC GAACGCAAGA TTATCCATCG 10620 

TATTATTTCA CGTATGGATG GCGTGACTAG TTACTCTGAA GGTGATGAGC CAAATCGCTA 10680 

TGTTGTTGTA GATACAGAAT AAGTAAAATC AGGTTTATCC TGATTTTTTG CTAGTTAGAG 10740 

GAGGTTAAAC TGATGTTGAA TAAGATAAGA GACTATTTAG ACTTTGCTGG TTTGCAGTAC 10800 

CGTAATCCTG ATAAAGCGGG AGCAGAGCGA GAGAAGATGC TGGCATTCCG CCACAAAGGA 10860 

CAAGAGGCCC GAAAGGTTTT TACAGAACTG GCCAAAGCCT TTCAAGCAAG -CCATCCAGAA 10920 

TGGCAACTCC AACAGACTAG CCAGTGGATG AATCAGGCCC AGCGTTTGAG ACCAGATTTT 10980 

TGGGTTTATC TACAGAGAGA CGGACAAGTG ACAGAACCTA TGATGGCCTT ACGTTTGTAT 11040 

GGGACATCTA CTGACTTTGG AATTTCTTTG GAAGTCAGTT TCATCGAACG TAAGAAGGAT 11100 

GAGCAAACAC TGGGCAAGCA GGCCAAAGTT TTAGACATTC CAACCGTTAA AGGGATTTAT 11160 

TATCTAACCT ACTCTAATGG TCAAAGTCAA CGGTGGGAGG CGAATGAAGA AAAGCGTCGT 11220 

ACTTTACGCG AGAAGGTGAG AAGTCAAGAA GTTCGAAAAG TTTTAGTGAA GGTAGATGTT 11280 

CCTATGACAG AAAATTCGTC TGAAGAAGAA ATCGTAGAAG GCTTATTGAA GTCTTATTCT 11340 

AAAATTCTTC CCTATTATCT AGCTACGAGA AAATAAGATA ATTTGTAAAA CATCATAAAT 11400 

CATACAGTCC AAGAGTGAAC AGTCCGCTGT GTAATTCTTG GTCTTTTTGT TTGCGCTTTC 11460 

GC ATT AT AT A ATAAACTTAC AAAAACAATT CAAAAGGAGA ACAATTATGG AAGTCGTTTC 11520 

AAGTGTTCTA AATTGGTTTT CTAGCAATAT TTTGCAGAAT CCCGCATTTT TCGTAGGTTT 11580 

ATTGGTGTTG ATAGGATATG CACTTTTGAA AAAACCTGCC CATGACGTTT TTTCAGGGTT 11640 

TGTTAAAGCA ACAGTAGGGT ATATGTTGCT TAACGTGGGT GCTGGTGGTT TGGTTACAAC 11700 

CTTTCGTCCA ATCTTAGCAG CTCTTAACTA CAAATTCCAA ATTGGTGCAG CGGTTATCGA 11760 

CCCTTACTTT GGACTTGCTG CAGCAAACAA CAAAATTGTA GCAGAGTTTC CAGATTTTGT 11820 

TGGAACTGCA ACTACAGCTC TATTGATTGG TTTTGGAATA AATATCTTGC TCGTAGCTCT 11880 

TCGAAAGATT ACGAAGGTAA GAACCCTCTT TATTACTGGT CACATCATGG TACAACAAGC 11940 

TGCAACAGTA TCTCTTATGG TTCTATTCTT AGTACCACAA TTGCGCAATG CTTACGGTAC 12000 

AGCAGCGATT GGTATCATCT GTGGACTTTA CTGGGCAGTT AGTTCAAATA TGACTGTTGA 12060 

GGCAACTCAA CGCTTGACTG GTGGTGGCGG ATTTGCGATT GGTCACCAAC AGCAATTTGC 12120 

AATCTGGTTT GTAGATAAAG TAGCAGGACG CTTTGGTAAG AAAGAAGAAA GTTTAGACAA 12180 

TCTTAAATTA CCTAAGTTCC TCTCAATCTT CCACGATACA GTTGTTGCAT CTGCTACCTT 12240 

GATGCTCGTA TTCTTCGGAG CCATTCTTTT AATCTTGGGT CCAGACATTA TGTCTAATAA 12300 
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AGAAGTCATC ACTTCAGGAA CTCTATTCAA TCCTGCTAAA CAAGATTTCT TTATGTACAT 12360 

TATCCAAACA GCCTTTACCT TCTCAGTTTA CTTGTTCGTT TTGATGCAAG GTGTCCGAAT 12420 

GTTCGTATCT GAGTTGACAA ACGCCTTCCA AGGTATTTCA AACAAATTGT TGCCAGGTTC 12480 

ATTCCCAGCG GTTGACGTTG CAGCTTCTTA TGGATTTGGT TCTCCAAATG CTGTCTTGTC 12540 

AGGATTTACC TTTGGTTTGA TTGGTCAATT GATTACAATT GTTTTGCTCA TCGTCTTTAA 12600 

AAATCCGATT CTTATTATTA CAGGATTTGT ACCAGTGTTC TTTGACAATG CAGCCATTGC 12660 

GGTCTACGCT GATAAACGCG GCGGATGGAA AGCGGCTGTT ATCCTTTCCT TTATATCAGG 12720 

TGTCCTTCAA GTTGCTCTAG GAGCTCTTTG TGTGGCCCTT CTCGATTTGG CATCTTATGG 12780 

TGGCTACCAT GGAAATATCG ACTTTGAATT CCCATGGCTT GGATTTGGAT ATATCTTCAA 12840 

ATACCTTGGT ATTGTTGGTT ATGTACTTGT GTGTCTCTTC TTGCTTGTTA TTCCTCAACT 12900 

TCAATTTGCC AAAGCAAAAG ATAAAGAGAA ATATTACAAC GGTGAAGTTC AAGAAGAAGC 12960 

TTAGTATCTA GAAAAGGAGA AATAAAATGG TTAAAGTATT AGCAGCGTGC GGAAATGGAA 13020 

TGGGTTCATC AATGGTTATC AAGATGAAGG TTGAAAATGC TCTCCGTAAG CTTAATCAAA 13080 

CAGATTTTAC AGTCAATTCA TGCAGTGTCG GTGAAGCTAA AGGTTTAGCA GTAGGATATG 13140 

ACATCGTAAT CGCTTCTCTT CATTTGATTC AAGAATTGGA AGGGCGAACT AATGGGAAGT 13200 

TAATTGGGCT TGATAACTTG ATGGATGATA AAGAAATCAC CGAAAAACTC AGTCAAGCAC 13260 

TACAGTAAAA GGTTGGAGGG GGCTGGACAG AAACTGAGAG TTATCGTTTC TGTCCTTCTC 13320 

CCTCTTTAAA TAAAGGAGGC AGATATGAAT TTAAAACAAG CTTTAATTGA CAATGACTCG 13380 

ATCCGACTAG GTTTAGAGGC TAACAATTGG AAAGAAGCAG TCAAGGTAGC AGTAGATCCC 13440 

TTAATTGAAA GTGGGGCAAT TTTGCCAGAG TATTACGATG CTATCATTGA ATCGACTGAA 13500 

GAGTATGGGC CTTACTATAT CTTGATGCCA GGTATGGCTA TGCCCCACGC TAGACCTGAA 13560 

GCAGGTGTGC AAAGTGATGC CTTTTCATTG ATTACCTTAC AAAATCCTGT TGTATTTTCA 13620 

GATGGGAAAG AGGTATCTGT TTTGTTGGCA CTAGCAGCAA CAAGTTCAAA AATTCACACA 13680 

AGTGTAGCCA TTCCACAAAT TATTGCCCTA TTTGAATTAG AAGATTCTAT TGCACGTTTA 13740 

CAGGCTTGCC AGACTAAAGA AGATGTCTTG GCTATGATTG AAGAATCTAA GGATAGCCCT 13800 

TATCTCGAAG GATTGGATTT GGAAAGTTAG AAAGAGGAAT AAAGAAATGA CAAAAAGAAT 13860 

ACCTAATTTA CAAGTTGCAT TAGACCATTC AGACTTGCAA GGAGCGATTA AAGCAGCTGT 13920 

TTCTGTTGGT CAGGAAGTAG ATATTATCGA AGCTGGAACT GTTTGCTTGC TTCAAGTTGG 13980 

AAGTGAACTG GCTGAAGTCT TGCGTAGCCT TTTCCCAGAT AAGATTATTG TGGCAGACAC 14040 

AAAATGTGCT GATGCTGGTG GAACAGTTGC TAAAAATAAT GCGGTTCGTG GAGCAGACTG 14100 
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GATGACTTGT ATCTGTTGTG CAACCATCCC TACTATGGAA GCAGCTCTAA AGGCTATCAA 14160 

GACTGAACGA GGAGAACGAG GCGAAATCCA GATCGAGCTT TATGGCGATT GGACTTTTGA 14220 

ACAAGCTCAG CTTTGGCTAG ATGCAGGTAT CTCACAAGCT ATTTATCACC AATCTCGTGA 14280 

TGCTCTTCTT GCTGGTGAAA CTTGGGGTGA AAAAGACCTT AATAAGGTTA AAAAACTCAT 14340 

TGACATGGGC TTCCGTGTAT CTGTAACAGG TGGTCTAGAT GTAGATACTC TCAAACTCTT 14400 

TGAAGGTATT GATGTCTTTA CCTTTATCGC AGGTCGTGGA ATTACAGAGG CTGTGGATCC 14460 

AGCAGGAGCA GCGCGTGCCT TCAAGGATGA AATCAAACGA ATTTGGGGGT AAATCATGGT 14520 

ACGTCCAATT GGAATTTATG AAAAGGCAAC CCCAACACAC TGTACTTGGC TAGAACGTTT 14580 

AAATTTTGCC AAGGAGTTAG GCTTTGATTT TGTCGAGATG TCTATTGACG AACGTGACGA 14640 

GCGTTTAGCA AGACTTGACT GGAGTAAGGA AGAACGCTTG GAAGTTGTCA AAGCAATCTA 14700 

TGAAACTGGT GTTCGTATTC CTTCTATCTG TTTTTCAGGC CATCGTCGCT ACCCATTGGG 14760 

TTCAAAAGAT CCAGTTCTAG AGGAAAAATC TCTAGAACTC ATGAAAAAAT GTATCGAATT 14820 

AGCTCAAGAC TTGGGAGTTC GTACGATTCA ATTAGCTGGT TACGATGTTT ACTATGAGGA 14880 

AAAGTCACCC CAGACACGCC AACGTTTTAT CAAAAATTTG AGAAAAGCCT GTGACTGGGC 14940 

TGAAGAAGCT CAGGTGGTAC TTGCTATTGA AATTATGGAT GATCCTTTCA TCAGTAGCAT 15000 

CGAAAAATAT TTGGCTATAG AAAAAGAGAT TGACTCTCCC TTCCTCTTTG TATATCCAGA 15060 

TATTGGTAAT GTGTCTGCAT GGCATAATGA TATCTATAGT GAGTTTTATC TTGGTCATCA 15120 

TGCCATCGCA GCTCTCCATC TCAAGGATAC TTATGCAGTG ACAGAAAGTT CAAAGGGCCA 15180 

GTTCCGAGAT GTACCTTTCG GGCAAGGTTG TGTCAAATGG GAAGAAGCTT TCGATATTTT 15240 

AAAGGAAACC AATTATAATG GACCTTTCCT AATCGAAATG TGGTCTGAAA ATTGTGAAAC 15300 

AGTAGAAGAA ACACGCGCAG CCATTCAAGA GGCGCAAGCT TTTCTCTATC CACTCATTAA 15360 

GAAAGCAGGT TTGATGTAAG ATGAATCAAG TAATCAATGC TATGCGTAAA CGAGTCTGTG 15420 

ATGCCAATCA ATCATTGCCA AAACATGGAC TTGTCAAATT TACCTGGGGG AATGTATCTG 15480 

AAGTTAATCG CGAACTCGGT GTCATTGTTA TCAAACCATC AGGCGTGGAT TATGACGAAT 15540 

TGACACCTGA AAACATGGTA GTGACTGATC TAGATGGTAA GATCCTAGAA GGGGATTTAA 15600 

GACCATCTTC CGACCTCCCA ACTCATGTGC AATTATATAA GACTTGGTCA GAAATTGGTA 15660 

GTGTGGTTCA CACCCATTCG ACAGAAGCTG TTGGTTGGGC TCAGGCAGGT CGTGATATTC 15720 

CTTTCTACGG AACAACCCAT GCAGATTATT TCTACGGTTC AATCCCTTGC GCCCGTAGTT 15780 

TGACCAAGGA CGAAGTAGAA GTGGCCTATG AAAAAGATAC TGGCCTGGTT ATCGTAGAAG 15840 
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AGTTTGAACA TCGCGGACTT AACCCGGTTG AAGTACCAGG AATTGTTGTA CGCAATCACG 


15900 


GTCCATTCAC 


CTGGGGCAAA 


AATCCAGAGA 


ATGCTGTTTA 


TCACTCTGTC 


GTACTAGAGG 


15960 


AAGTATCAAA 


GATGAATCGC 


TTTACAGAAC 


AAATCAATCC 


AAGAGTTGGA 


CCTGCTCCCC 


16020 


AGTACATACT 


AGAAAAACAC 


TACCAACGTA 


AACATGGACC 


AAATGCTTAT 


TATGGTCAAA 


16080 


AGTAAGAACG 


ATGAAGGAGG 


AGAAAAAGAT 


AAATTTAGCT 


CCTCTTTTTA 


CATTTGATTT 


16140 


TTATTGAGAG 


TAAAGTTGGA 


GTTGAAGTAA 


TTTTAAAAGA 


TTTTTTAGAA 


ATAGCGCTTG 


16200 


ATATATATAT 


GGTAAAATAA 


AAAGAATTGC 


TGTGATATCA 


ATAGATTTGG 


GGGATTTTTT 


16260 


AATATGGTAC 


TGGATAAGGC 


AAGTTGTGAT 


TTGCTTCAAT 


ATTTGATGGA 


TCAAGAAACG 


16320 


TCCAAAACGA 


TTATGGCGAT 


TTCGAAAGAT 


TTGAAAGAGT 


CAAGAAGGAA 


AATTTATTAT 


16380 


CACATTGACA 


AAATCAATGC 


TGCTCTGGGT 


GACGAGGCGC 


TTCACATCAT 


TAGTATTCCA 


16440 


CGAATTGGTA 


TTCACTTAAC 


GGAAGAGCAG 


AGAGATGCTT 


GTTGTAAACT 


ATTATCGGAA 


16500 


GTAGATTCGT 


ACGATTATAT 


CATGAGTGCG 


CATGAACGTA 


TGATGATAAT 


GTTACTATGG 


16560 


ATAGGTATTT 


CTAAAGAACG 


TATTACGATT 


GAAAAATTGA 


TAGAGTTAAC 


AGAGGTATCT 


16620 


AGGAATACTG 


TTCTCAATGA 


TTTGAATAGT 


ATTCGTTATC 


AACTAACTTT 


GGAACAATAT 


16680 


CAGGTGATCT 


TGCAAGTGAG 


CAAGTCACAG 


GGATACAACC 


TTCATGCCCA 


CCCTCTTAAT 


16740 


AAAATTCAGT 


ATCTTCAATC 


GCTTCTATAT 


CATATTTTTA 


TGGAAGAAAA 


TGCCACTTTT 


16800 


GTATCTATTT 


TAGAAGATAA 


GATGAAAGAG 


AGGTTAGATG 


ATGAGTGTTT 


GCTTTCTGTT 


16860 


GAAATGAACC 


AATTTTTTAA 


GGAACAGGTT 


CCTTTAGTTG 


AACAAGATTT 


AGGGAAGAAA 


16920 


ATAAACCATC 


ATGAAATAAC 


TTTTATGTTG 


CAGGTTCTAC 


CTTATTTGCT 


GTTAAGCTGT 


16980 


CATAATGTTG 


AACAGTATCA 


AGAAAGACAT 


CAGGATATAG 


AGAAAGAATT 


TTCTTTGATA 


17040 


AGAAAAAGAA 


TAGAGTATCA 


GGTGTCTAAG 


AAATTAGGAG 


AACGGTTGTT 


TCAAAAGTTT 


17100 


GAAATTTCTT 


TGTCAGGACT 


TGAAGTTTCT 


CTTGTAGCTG 


TTCTCCTCCT 


CTCCTATCGT 


17160 


AAAGATTTGG 


ATATTCATGC 


AGAAAGTGAT 


GATTTTCGGC 


AATTAAAACT 


TGCTTTAG/vA 


17220 


GAATTTATCT 


GGTATTTTGA 


ATCACAAATC 


CGAATGGAGA 


TTGAGAACAA 


GGATGATTTG 


17280 


TTACGAAATT 


TGATGATCCA 


CTGTAAAGCC 


TTGTTATTTA 


GAAAGACTTA 


CGGTATTTTT 


17340 


TCTAAAAATC 


CTCTAACAAA 


ACAAATTCGA 


TCCAAGTATG 


GAGAATTATT 


TTTAGTCACT 


17400 


AGAAAATCTG 


CGGAAATTTT 


AGAAGGAGCA 


TGGTTTATTC 


GGCTAACAGA 


CGATGATATT 


17460 


GCCTATTTGA 


CGATTCATAT 


TGGAGGATTT 


TTAAAATATA 


CACCATCATC 


TCAAAAAAAT 


17520 


ATGAAAAAAG 


TTTATCTCGT 


TTGTGATGAA 


GGTGTTGCGG 


TTTCGAGACT 


TTTGCTGAAA 


17580 


CAATGCAAAC 


TTTATTTTCC 


AAATGAGCAA 


ATTGACACTG 


TATTTACAAC 


AGAACAATTT 


17640 
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AAGAGTGTGG 


AAGATATTGC 


ACAAGTTGAT 


GTAGTGATTA 


CTACTAATGA 


TGATTTGGAT 


17700 


AGCAGATTTC 


CGATTTTAAG 


GGTTAATCCT 


ATCCTTGAAG 


CAGAAGATAT 


TTTGAAAATG 


17760 


CTAGACTATC 


TTAAACACAA 


TATATTTCGT 


AATAAGAGCA 


AAAGTTTCAG 


TGAAAATCTT 


17820 


TCTAGTCTTA 


TTTCGTCTTA 


TATTGTAGAC 


AGCAAGTTGG 


CTAGTAAGTT 


CCAAGAAGAG 


17880 


GTTCAAACAC 


TTATAAATCA 


AGAAATAGTA 


GTTCAAGCTT 


TTTTGGAAGr 


TATTTGAAGG 


17940 


ACAGTCCAAT 


GATGAACACA 


AACCTGTGTk 


TTTCsTGGTC 


TTTTtTAGTG 


TTTTGAAGGG 


18000 


TGGkATACTA 


ATCTCAAAGA 


TAACAATTAT 


ATCCAAAGGA 


GGCAACATAT 


GCCAAACGTC 


18060 


AAAGAAATTA 


CAAGAGAGTC ATGGATTTTA GCCACTTTCC CAGAGTGGGG 


AACATGGTTG 


18120 


AACGAAGAAA 


TCGAAGAAGA 


AGTCGTACCT 


GAAGGCAACT 


TTGCCATGTG 


GTGGCTAGGC 


18180 


AACTGTGGTA 


CTTGGATTAA 


GACACCAGCT 


GGTGCTAACG 


TTGTCATGGA 


CCTTTGGTCA 


18240 


AACCGTGGAA 


AATCAACCAA 


AAAAGTGAAA 


GATATGGTTC 


GTGGGCACCA 


AATGGCAAAT 


18300 


ATGGCAGGTG 


TTCGTAAGCT 


GCAACCAAAC 


TTGCGTGTTC 


AGCCAATGGT 


TATCGATCCA 


18360 


TTTGCTATCA 


ACGAACTAGA 


CTATTACTTA 


GTTTCACACT 


TCCACAGTGA 


TCATATCGAC 


18420 


CCATACACAG 


CTGCAGCAAT 


TCTCAATAAT 


CCTAAGTTAG 


AGCATGTTAA 


GTTGG 


18475 


(2) INFORMATION FOR SEQ ID NO: 39: 








(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7186 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
(DJ TOPOLOGY : linear 








(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 






CCAGGATTTG 


GTACCGTTGC 


AAGTGGTGTG 


CCTTTCCTCC 


TAAAGGAAAA 


TGGAGGAAAA 


60 


ATCAATCAAT 


CAGCACATTC 


AGATATCAAA 


GTTGCTAAGG 


TATTGGTCAA 


GGATGAAGAT 


120 


GAAAAAAATC 


GCTTGCTTGC 


AGCAGGGAAT 


GACTTTAACT 


TTGTAACCAA 


TGTGGATGAT 


180 


ATTTTATCAG 


ACCAGGATAT 


TACTATCGTA 


GTGGAATTGA 


TGGGGCGTAT 


TGAGCCTGCT 


240 


AAAACCTTTA 


TCACTCGTGC 


CTTGGAAGCT 


GGAAAACACG 


TTGTTACTGC 


TAACAAGGAC 


300 


CTTTTAGCTG 


TCCATGGCGC 


AGAATTGCTA 


GAAATCGCTC 


AAGCTAACAA 


GGTAGCACTT 


360 


TACTACGAAG 


CAGCAGTTGC 


TGGTGGGATT 


CCAATTCTTC 


GTACTTTAGC 


AAATTCCTTG 


420 


GCTTCTGATA 


AAATTACGCG 


CGTGCTTGGA 


GTAGTCAACG 


GAACTTCCAA 


CTTCATGGTG 


480 


ACCAAGATGG 


TGGAAGAAGG 


CTGGTCTTAC 


GATGATGCTC 


TTGCGGAAGC 


ACAACGTCTA 


540 
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GGATTTGCAG AAAGCGATCC GACGAATGAC GTAGATGGGA TTGATGCAGC CTACAAGATG 600 

GTTATTTTGA GCCAATTTGC CTTTGGCATG AAGATTGCCT TTGATGATGT AGCCCACAAG 660 

GGAATCCGCA ATATCACACC AGAAGACGTA GCTGTAGCTC AAGAGCTTGG TTACGTAGTG 720 

AAATTGGTTG GTTCTATTGA GGAAACTTCT TCAGGTATTG CTGCAGAAGT GACTCCAACC 780 

TTCCTACCTA AAGCGCACCC ACTTGCTAGT GTGAATGGCG TAATGAACGC TGTCTTTGTA 840 

GAATCTATCG GTATTGGTGA GTCTATGTAC TAGGGACCAG GTGCGGGTCA AAAACCAACT 900 

GCAACAAGTG TTGTAGCTGA TATTGTCCGT ATCGTTCGTC GTTTGAATGA , TGGTACTATT 960 

GGCAAAGACT TCAACGAATA TAGCCGTGAC TTGGTCTTGG CAAATCCTGA AGATGTCAAA 1020 

GCAAACTACT ATTTCTCAAT CTTGGCTCTA GACTCAAAAG GTCAGGTCTT GAAGTTGGCT 1080 

GAAATCTTCA ATGCTCAAGA TATTTCCTTT AAGCAAATCC TTCAAGATGG CAAAGAGGGT 1140 

GACAAGGCGC GTGTCGTTAT CATCACACAC AAGATTAATA AAGCCCAGCT TGAAAATGTC 1200 

TCAGCTGAAT TGAAGAAGGT TTCAGAATTC GACCTCTTGA ATACCTTCAA GGTGCTAGGA 1260 

GAATAAGATG AAGATTATTG TACCTGCAAC CAGTGCCAAT ATCGGGCCAG GTTTTGACTC 1320 

GGTCGGTGTA GCTGTAACCA AGTATCTTCA AATTGAGGTC TGCGAAGAAC GAGATGAGTG 1380 

GCTGATTGAA CACCAGATTG GCAAATGGAT TCCACATGAC GAGCGTAATC TCTTGCTCAA 1440 

AATCGCTTTG CAAATTGTAC CAGACTTGCA ACCAAGACGC TTGAAAATGA CCAGTGATGT 1500 

CCCTTTGGCG CGCGGTTTGG GTTCTTCCAG CTCGGTTATC GTTGCTGGGA TTGAACTAGC 1560 

CAACCAACTG GGTCAACTCA ACTTATCAGA CCATGAAAAA TTGCAGTTAG CGACCAAGAT 1620 

TGAAGGGCAT CCTGACAATG TGGCTCCAGC CATTTATGGT AATCTCGTTA TTGCAAGTTC 1680 

TGTTGAAGGG CAAGTCTCTG CTATCGTAGC AGACTTTCCA GAGTGTGATT TTCTAGCTTA 1740 

CATTCCAAAC TATGAATTAC GTACTCGCGA CAGCCGTAGT GTCTTGCCTA AAAAATTGTC 1800 

TTATAAGGAA GCTGTTGCTG CAAGTTCTAT CGCCAATGTA GCGGTTGCTG CCTTGTTGGC 1860 

AGGAGACATG GTGACCGCTG GGCAAGCAAT CGAGGGAGAC CTCTTCCATG AGCGCTATCG 1920 

TCAGGACTTG GTAAGAGAAT TTGCGATGAT TAAGCAAGTG ACCAAAGAAA ATGGGGCCTA 1980 

TGCAACCTAC CTTTCTGGTG CTGGGCCGAC AGTTATGGTT CTGGCTTCTC ATGACAAGAT 2040 

GCCAACAATT AAGGCAGAAT TGGAAAAGCA ACCTTTCAAA GGAAAACTGC ATGACTTGAG 2100 

AGTTGATACC CAAGGTGTCC GTGTAGAAGC AAAATAAAGA ATAGAAGATA GGATGGGGAA 2160 

ACTCTTGACC AGAGGGGTTC ATATCCTTTT TGTGAAAAGA AGTTTATACT CAATGAAAAT 2220 

CAAAGAGCAA ACTAGGAAGC TAGCCGCAGG CTGCTCAAAA CAGTGTTTTG AGGTTGCAGA 2280 

TAGAACTGAC GAAGTCAGCT CAAGACACTG TTTTGAGGTT GCAGATAGAA CTGACGAAGT 2340 
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CAGTAACCAT ACTACGGTAA GGTGACGCTG ACGTGGTTTG AAGAGATTTT CGAAGAGTAT 2400 

TAGTTAAAAA CGTGATAAAG GAGAAATAAA GATGGCAGAA ATTTATCTAG CAGGTGGTTG 2460 

TTTTTGGGGC CTAGAGGAAT ATTTTTCACG CATTTCTGGA GTGCTAGAAA CCAGTGTTGG 2520 

CTACGCTAAT GGTCAAGTCG AAACGACCAA TTACCAGTTG CTCAAGGAAA CAGACCATGC 2580 

AGAAACGGTC CAAGTGATTT ACGATGAGAA GGAAGTGTCA CTCAGAGAGA TTTTACTTTA 2640 

TTATTTCCGA GTTATCGATC CTCTATCTAT CAATCAACAA GGGAATGACC GTGGTCGCCA 2700 

ATATCGAACT GGGATTTATT ATCAGGATGA AGCAGATTTG CCAGCTATCT ACACAGTGGT 2760 

GCAGGAGCAG GAACGCATGC TGGGTCGAAA GATTGCAGTA GAAGTGGAGC AATTACGCCA 2820 

CTACATTCTG GCTGAAGACT ACCACCAAGA CTATCTCAGG AAGAATCCTT CAGGTTACTG 2880 

TCATATCGAT GTGACCGATG CTGATAAGCC ATTGATTGAT GCAGCAAACT ATGAAAAGCC 2940 

TAGTCAAGAG GTGTTGAAGG CCAGTCTATC TGAAGAGTCT TATCGTGTCA CACAAGAAGC 3000 

TGCTACAGAG GCTCCATTTA CCAATGCCTA TGACCAAACC TTTGAAGAGG GGATTTATGT 3060 

AGATATTACG ACAGGTGAGC CACTCTTTTT TGCCAAGGAT AAGTTTGCTT CAGGTTGTGG 3120 

TTGGCCAAGT TTTAGCCGTC CGATTTCCAA AGAGTTGATT CATTATTACA AGGATCTGAG 3180 

CCATGGAATG GAGCGAATTG AAGTTCGTTC TCGTTCAGGC AGTGCTCACT TGGGTCATGT 3240 

TTTCACAGAT GGACCGCGGG AGTTAGGCGG CCTCCGTTAC TGTATCAATT CTGCTTCTTT 3300 

ACGCTTTGTG GCCAAGGATG AGATGGAAAA AGCAGGATAT GGCTATCTAT TGCCTTACTT 3360 

AAACAAATAA AACAGAGAGT GGGGCTTCCC ACTTTCTTCA TTTCTAGAAT ATGAATAGAA 3420 

GGGATTTATG AAACACCTAT TATCTTACTT CAAACCCTAC ATCAAGGAAT CAATTTTAGC 3480 

CCCCTTGTTC AAGCTGTTAG AAGCTGTTTT TGAGCTCTTG GTTCCCATGG TGATTGCTGG 3540 

GATTGTTGAC CAATCTTTAC CTCAGGGAGA TCAAGGTCAT CTCTGGATGC AGATTGGCCT 3600 

GCTCCTTATC TTTGCAGTAA TTGGCGTTTT AGTGGCCTTG ATAGCTCAAT TTTACTCAGC 3660 

AAAGGCAGCA GTAGGTTCTG CTAAGGAATT GACAAACGAT CTTTATCGTC ATATTCTTTC 3720 

CTTGCCCAAG GACAGCAGAG ACCGTCTGAC AACTTCTAGT TTGGTCACTC GCTTGACTTC 3780 

GGATACCTAC CAGATTCAGA CTGGTATCAA TCAATTCCTG CGTCTCTTTT TACGAGCGCC 3840 

CATTATCGTT TTTGGTGCCA TTTTTATGGC TTATCGAATC TCAGCTGAGT TGACTTTCTG 3900 

GTTCTTAGTC TTGGTTGCCA TTTTGACCAT TGTCATTGTA GGGTTATCTC GATTGGTCAA 3960 

TCCTTTCTAC AGTAGTCTCA GAAAGAAAAC GGACCAACTG GTTCAGGAAA CGCGCCAGCA 4020 

ATTGCAAGGG ATGCGGGTTA TTCGTGCTTT TGGTCAAGAA AAACGAGAGT TACAGATTTT 4080 
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TCAAACCCTT AACCAAGTTT ATGCTAGATT ACAAGAAAAG ACAGGTTTCT GGTCTAGTTT 4140 

ATTAACACCT CTGACCTATC TGATTGTCAA TGGAACTCTT CTCGTTATTA TCTGGCAAGG 4200 

CTATATTTCA ATTCAAGGAG GAGTGCTCAG TCAAGGTGCT CTCATTGCTC TTATCAATTA 4260 

CCTCTTACAG ATTTTGGTGG AATTGGTCAA GCTAGCCATG TTGATCAATT CCCTCAACCA 4320 

GTCCTATATC TCAGTCAAGC GAATCGAGGA AGTCTTTGTT GAGGCTCCAG AGGATATCCA 4380 

TTCAGAGTTA GAACAAAAGC AAGCTACCAG AGATAAGGTT TTACAAGTCC AAGAATTGAC 4440 

CTTTACCTAT CCTGATGCGG CCCAGCCTTC TCTGAGATAC ATTTCCTTTG ATATGACTCA 4500 

AGGACAAATT CTAGGTATCA TCGGGGGAAC TGGTTCTGGT AAATCAAGCT TGGTGCAACT 4560 

CTTACTTGGA CTTTATCCAG TAGACAAGGG GAACATTGAC CTTTATCAAA ATGGACGTAG 4620 

TCCTCTTAAT TTGGAGCAGT GGCGGTCTTG GATTGCCTAT GTACCTCAAA AGGTCGAACT 4680 

CTTTAAAGGA ACCATTCGTT CCAACTTGAC TCTAGGTTTC AATCAAGAAG TATCTGACCA 4740 

GGAACTCTGG CAGGCCTTGG AGATTGCGCA AGCTAAGGAT TTTGTCAGTG AAAAGGAAGG 4800 

ACTCTTGGAT GCTCTAGTTG AGGCAGGGGG GCGAAATTTC TCAGGTGGAC AAAAACAAAG 4860 

ATTGTCTATC GCCCGAGCAG TCTTGCGCCA GGCTCCGTTT CTCATCCTAG ATGATGCAAC 4920 

CTCGGCACTG GATACCATTA CAGAGTCCAA GCTCTTGAAA GCTATTAGAG AAAATTTTCC 4980 

AAACACGAGC TTAATTTTGA TCTCTCAACG AACCTCAACT TTACAGATGG CGGACCAGAT 5040 

TCTCCTCTTG GAAAAAGGTG AGTTGCTAGC TGTTGGCAAG CACGATGACT TGATGAAATC 5100 

CAGCCAAGTC TATTGTGAAA TCAATGCATC CCAACATGGA AAGGAGGACT AGAATGAAAC 5160 

GACAAACTGT AAACCAGACG CTCAAACGTT TAGCCGTAGA TTTAGCAAGC CATCCTTTCC 5220 

TCCTTTTCCT AGCCTTTCTA GGAACTATTG CCCAAGTTGG CTTATCAATT TACCTACCTA 5280 

TTCTGATTGG GCAGGTCATT GACCAAGTCC TAGTGGCTGG TTCATCACCA GTTTTTTGGC 5340 

AGATTTTTCT CCAGATGCTC TTGGTGGTAA TAGGAAATAC TCTGGTACAA TGGGCCAATC 5400 

CTCTCCTCTA TAATCGTCTA ATCTTCTCTT ATACCAGAGA TTTACGGGAG CGAATCATCC 5460 

ATAAGCTCCA TCGTTTACCG ATTGCCTTTG TAGATAGGCA AGGTAGTGGA GAGATGGTTA 5520 

GTCGTGTAAC CACGGACATC GAACAGTTGG CAGCTGGCTT GACCATGATT TTTAACCAAT 5580 

TTTTCATTGG TGTTTTGATG ATTTTGGTCA GTATTCTAGC CATGCTCCAA ATTCATCTCC 5640 

TCATGACTCT CTTAGTCTTG CTGTTGACGC CACTGTCCAT GGTGATTTCA CGCTTTATTG 5700 

CCAAGAAATC CTATCATCTC TTCCAGAAGC AAACAGAGAC GAGGGGAATT CAGACTCAGT 5760 

TGATTGAAGA ATCGCTTAGT CAGCAGACTA TAATCCAGTC CTTCAATGCT CAAACAGAAT 5820 

TTATCCAAAG ATTGCGTGAG GCTCATGACA ACTACTCAGG CTATTCTCAG TCAGCCATCT 5880 



WO 98/18931 



PCT/US97/19588 



389 

TTTATTCTTC AACGGTCAAT CCTTCGACTC GCTTTGTAAA TGCACTCATT TATGCCCTTT 5940 

TAGCTGGAGT AGGAGCTTAT CGTATCATGA TGGGTTCAGC CTTGACCGTC GGTCGTTTAG 6000 

TGACTTTTTT GAACTATGTT CAGCAATACA CCAAGCCCTT TAACGATATT TCTTCAGTGC 6060 

TAGCTGAGTT GCAAAGTGCT CTGGCTTGCG TAGAGCGTAT CTATGGAGTC TTAGATAGCC 6120 

CTGAAGTGGC TGAAACAGGT AAGGAAGTCT TGACGACCAG TGACCAAGTT AAGGGAGCTA 6180 

TTTCCTTTAA ACATGTCTCT TTTGGCTACC ATCCTGAAAA AATTTTGATT AAGGACTTGT 6240 

CTATCGATAT TCCAGCTGGT AGTAAGGTAG CCATCGTTGG TCCGACAGGT GCTGGAAAAT 6300 

CAACTCTTAT CAATCTCCTT ATGCGTTTTT ATCCCATTAG CTCGGGAGAT ATCTTGCTGG 6360 

ATGGGCAATC CATTTATGAT TATACACGAG TATCATTGAG ACAGCAGTTT GGTATGGTGC 6420 

TTCAAGAAAC CTGGCTCACA CAAGGGACCA TTCATGATAA TATTGCCTTT GGCAATCCTG 6480 

AAGCCAGTCG AGAGCAAGTA ATTGCTGCTG CCAAAGCAGC TAATGCAGAC TTTTTCATCC 6540 

AACAGTTGCC ACAGGGATAC GATACCAAGT TGGAAAATGC TGGAGAATCT CTCTCTGTCG 6600 

GCCAAGCTCA GCTCTTGACC ATAGCCCGAG TCTTTCTGGC TATTCCAAAG ATTCTTATCT 6660 

TAGACGAGGC AACTTCTTCC ATTGATACAC GGACAGAAGT GCTGGTACAG GATGCCTTTG 6720 

CAAAACTCAT GAAGGGCCGC ACAAGTTTCA TCATTGCTCA CCGTTTGTCA ACCATTCAGG 6780 

ATGCGGATTT AATTCTTGTC TTAGTAGATG GTGATATTGT TGAATATGGT AACCATCAAG 6840 

AACTCATGGA TAGAAAGGGT AAGTATTACC AAATGCAAAA AGCTGCGGCT TTTAGTTCTG 6900 

AATAAGCCAT TCTCTTTTGA AAGTTTATGG ACGAAAAAAG TTGCCTTCGA GTGACTTTTT 6960 

TGTTACAATA GCTAGAAAAA TTGTTCACTG TAATACTCAA TGAAAATCAA AGAGCAAACT 7020 

AGGAAGCTAG CCGTAGGTTG CTCAAAGCAC AGCTTTGAGG TTGTAGATAA GACTGACGAA 7080 

GTCAGTTCAA AACACTGTTT TGAGGTTGCA GATAGAACTG ACGAAGTCAG CTCAAAACAC 7140 

TGTTTTGAGG TTGCAGATAG AACTGACGAA GTCAGCTCAA AACAGG 7186 
(2) INFORMATION FOR SEQ ID NO: 40: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14273 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CTGAAAATTC TAAAAAATTT ATAAGTAAGG AATTAATTAG TTATTTTTGT GATAAAGTTT 



60 
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ATGATGAAAT ATTTGTTGAA GAGGTAGTTC CGCACGTTTT TCTGCCATAT GAATCTGACT 120 

TACTTCTTAT TTTACCAGCT ACGGCAAATG TGATTGGCAA AATTGCTAAT GGTATTGCTG 180 

ATGATTTAGT TACAGCAACT GTTTTAAACT TTAATAAAAA AATAATTTTT TGTCCCAATA 240 

TGAACTCTAC TATGTGGGAC AATCACATAG TTCAAAGAAA TGTATCAATT CTAAAGGAGT 300 

TGGGACATAT ATTTTTATTT GAGTCTAAAA AAACATATGA GGTAGGATTG CGTAAAGCAA 360 

TAGATTCAAC ATGTTGAATG TTACAACCAC AGTCGTTAGT AAAAGAACTT ATCAAATTAG 420 

AAAATATTGT CCTTGAAGAG GGACATTAAA AACTACTGAG AATATTAATG AGGGGAAAAA 480 

ATGGAAAATT CATCAATCGA TGTAGATATG CTGTTGGAAG AATTGACACA AGAAGCAATG 540 

GTCGTTGTTG CTGTTGATAA GGACTGTTAA TTTAAACTTA TGGCAATATA TGAAAGGTTA 600 

CTGGATGTTT TAAATTATGC AGGCAGTAGC CTTTTATTAT ATACAAATGG ATAAAGTAAG 660 

GATAATACAA TGATTAATAA AAAAATACAA CAAGTTGTTT TGGAATCATT ACAGAATTTT 720 

TTGAATGGGA ACTTCATTTC GCCTTGTGTA GTCTATGATT TTGGCTTGCT GGAAACTGTA 780 

CTTGATGAAT TTAAAAATCA AATTCCTGTA ACATTCAATT ACCAACTTTT TTATGCCGTT 840 

AAAGCAAATT CAAATGAGAA GATACTTGAA TTCTTAGTAG ATAAAATTGA TGGAGTTGAT 900 

GTGGCGTCAT TATCTGAATT AGATGTGGCT AAAAAATTTT TCCCACCAAC TCAAATTTCT 960 

GTTAATGGTC CCGCATTTTC TTATGAAACT TTATATAATC TGATTAAAAA ACAATATAAA 1020 

GTTGATATTA ACTTTTTGGA ACATCTTCAA CAATTTTCCC CAAAAGAATC TGTTGGAATA 1080 

AGAGTAACGG AGCCAGATGA ACTTAATAAT CGTATGAGTC GATTTGGAAT AAATATTTGC 1140 

AGTGATAATT GGACTAGTAA TTTACAAAAT CCTTTAATTA CACGACTGCA TTTTCATTTT 1200 

GGAGAAAAAG ATGATAAATT TATTGTTAAG TTAGATAAAA TATTATTTAA GTTACAAGAA 1260 

ATTAATAAAC TTAGAGAGGT TAGAGAAATA AATCTTGGAG GCGGTTTTAT GAAATTATTT 1320 

ATGGAAAATC GTTTGAAAGA ATTTTTTCTA TCACTTATGG AAATCTATAA AAAGTACGAT 1380 

ATTGATAGTA CTGTGACTAC AATAATAGAA CCAGGTAGTG CAATTACTTC ATTTTCTGCC 1440 

TATATGATTA CTAGCCCAGT TAATGTTAGT GAGGTGAATG AGCAGCAGGT TATCACGTTA 1500 

GACACATCAA TATACACCAA TACATTATGG TTTGTTCCGC AT ATT ATT AC AACGTTAAAT 1560 

TCAAGTAGTA AAGAGCGTTA TAGTACTATT CTCTATGGTA ATACCTGTTA TGAACATGAC 1620 

AAGTATAAAA TGAAAGTTTC GCTTCCAAGG TTAACTCAAA ATAGCAGTAT AGTGTTTTTT 1680 

CCTGTAGGAG CTTATATAAA AAGCAATCAT TCAAATTTAC ATCGTAATGA TTTTATGCGG 1740 

GAGGTATATT TGTGGACAAA AAACTTGACA TATTAGATAA AGTTAAGGAA TATTTAGGAA 1800 

ATAAAACTAC TCAAATTCTG GATAATCAAT ATAAAGAATT TTTGAAACTT AATGATATAA 1860 
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GGCGAGCGTT TGGTATTTCA GAAAAAGTAT TAAACAATTC TTTTAATTTT ACGAGTAAAG 1920 

AATTTAATGA -TTTAATTAAT AACGAAAATT ATTTATTCGA ATATGCATGT AGAATTAGAG 1980 

AGGAATGGAG AAAAAAATGC TTTAATCATT CTTATCGTTT TCTATGCTCA CCTATAATTA 2040 

CAGATGATTT TCTTAACACG AAGACATTGA GAAGTAGCCA AATTGAATAT AAATATGAGC 2100 

GATATTTATC GAAAAGTTCG ATAGGCGATA GAGCGGTTGA TGGCTTTGTT TCCTTCAATA 2160 

CTTTAACAGC TAATGGTATG TCTGCTATTA AACTATGTCT TGAGATATTA AACTCTATTT 2220 

TCTTCAAGAA GAAGATTGAT TTATTATATT CAACCGGATA TTATGAAACA AGATTTTTAT 2280 

TAAATAATCT TGCTAAATCA GGTATTAGTT GCTATGAGGT AAGTAATTGT GAATTGGATA 2340 

AAGATAAATT TTATAATGTA TTCATGATGG AACCCAATCG AGCCGATTTA ACATTACAAA 2400 

AAACTGATTT CAAGATAGTA GAATATTTTG TTAAGTATAA AAATAATTCA ATAAAAGTCG 2460 

TTATTTTAGA TATTTCATAT CAAGGTTCTA ATTTTAAATT AGTAGAATTT TTAGAGAAAT 2520 

TTAAATTTGC GAATGTAATT ATTTTTGTGG TACGATCTTT GATAAAATTA GATCAAATGG 2580 

GATTAGAATT GACAAATGGG GGAATAATAG AAGTGTTTAT TCCTAATCAT TTGAGAAAGT 2 640 

TGAAAAATTT TATTGAAGAG GAATTCAATA AATTTAGAAA TTCTCACGGA GCTAATCTAA 2700 

GCCTCTATGA ATACTGTTTG CTTGATAATT CTTTAACTTT AAAAAATGAT TGGAACTATT 2760 

CTGATTTAGT TATGAAATTT ACGAGTAATT TTTATGCTGA TATAAAAGAC TTGTTCATGG 2820 

AAAATTCTGA TATTGAAATC ATCCATGAAG AGGGAGTACC TTTTGTATTT TTAGATTTAA 2880 

TAGGTGAAGG TAAAAAAGAA TATGAAATGT TTTTTCAATG GTTAAACTTC TTTTACAAAC 2940 

AGCTTGGAAT CACATTGTAT GCTAGAAATA GTTTTGGGTT TCGGAATCTA ACAGTAGAGT 3000 

ATTTTGGAAT TATTGGGACA GAAAGATATA TATTTAAGAT TTGTCCAGGT GTTTATAAAG 3060 

GGTTAAGTTA TTATTTGATG AAATTTTTAT TAAAATCTTT TTCAAATGAA TATTTAAAAA 3120 

CTACTGATGA GGTTAATAGA TGAAAAATTT GATAAAGTTG CTAATAATTA GATTGATTGT 3180 

TAACTTAGCA GACAGTGTAT TTTATATAGT AGCATTGTGG CACGTTAGCA ATAATTATTC 3240 

TTCGAGCATG TTCTTAGGAA TATTTATTGC AGTAAATTAT CTACCGGATT TGTTACTAAT 3300 

CTTTTTTGGA CCAGTTATTG ACAGAGTAAA TCCGCAAAAA ATTCTTATAA TATCAATTTT 3360 

GGTTCAATTA GCAGTGGCTG TAATATTTTT ATTATTATTA AACCAAATAT CATTTTGGGT 3420 

GATAATGAGT CTAGTGTTTA TTTCAGTAAT GGCTAGCTCC ATAAGTTACG TGATAGAAGA 3480 

TGTGTTGATT CCTCAAGTGG TAGAATATGA TAAGATTGTA TTTGCAAATT CTCTTTTTAG 3540 

TATTTCGTAT AAAGTATTAG ATTCTATTTT TAATTCATTC GCATCATTTT TACAGGTGGC 3600 
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AGTAGGATTT ATTTTATTGG TTAAGATAGA TATAGGCATA TTTTTACTTG CTCTATTTAT 3660 

ATTGTTGTTG TTAAAATTTA GAACTAGCAA TGCGAATATA GAAAACTTCT CTTTCAAATA 3720 

TTACAAGAGA GAAGTGTTGC AAGGTACAAA GTTTATTTTA AATAATAAAT TATTATTTAA 3780 

AACCAGTATT TCTTTAACGC TTATAAACTT TTTTTATTCA TTTCAGACAG TAGTTGTACC 3840 

GATTTTTTCT ATTCGATATT TTGATGGTCC GATTTTTTAT GGTATTTTTT TAACTATTGC 3900 

TGGTTTGGGT GGTATATTGG GAAATATGCT AGCGCCAATC GTAATAAAAT ATTTAAAATC 3960 

GAATCAAATT GTTGGTGTAT TTCTTTTTTT GAACGGCTCA AGTTGGTTAG TAGCAATTGT 4020 

TATAAAAGAC TATACTTTAT CACTTATTTT ATTTTTCGTT TGTTTTATGT CTAAAGGAGT 4080 

CTTCAATATT ATTTTTAATT CGTTGTACCA ACAAATACCT CCACATCAAC TTCTTGGTAG 4140 

GGTAAATACT ACCATTGATT CTATTATTTC TTTTGGAATG CCAATTGGTA GTTTAGTTGC 4200 

AGGAACGCTT ATTGATTTGA ATATTGAATT AGTGTTAATT GCTATTAGCA TACCTTATTT 4260 

TTTGTTTTCT TATATTTTTT ATACGGATAA TGGATTGAAA GAATTTAGTA TATATTAGAA 4320 

ATGTTTATGT TCATTCAAAA GCATAATGAC TATAACTGAA AAAGAAAAGT GATATCTTTA 4380 

AGGTTGTTCT TCTTGGTGGT GAGATTCGTG AGACAACCCA AGCTTTTGTC GGAAAGATTA 4440 

CCAATGCTTT GATGGATAGG ATGTACTTTA GCAAGATGTT TTTAGTGGTA ACGGTATCGT 4500 

GGATGGACGT GTAATAACCT CTTCTTTCGA GGAGTATTTT ACTAAAAAAC TAGCCTTGGA 4560 

GCGTTCCCCA GAAACGGACT TACTCATTGA CTCTTCAAAG ATTTGGGGAG AAGATTTTGC 4620 

TTCATCTGTT CCTTGAAAAA AGTCACAGCA GTCATCACAG ACGATAGTAC TGAACAAAAC 4680 

TATGAAGAGT TAGAAATTTA TACGCAGGTG ATTGTATAAA GGATCTGGAA ATAGATAAGA 4740 

AGTTGATTAG TATTGACCTA GGTGGTACAA ATATTAAGAT TACTGTTCTT TCAAATGACG 4800 

GTGAGATTGA AACTTTGTGG AGTATTACAA CAGATACAAG TGAGAAAGGT TCTCAAATTA 4860 

TATCGGACAT CATCAGTTCT ATTAAAAATA AATTGACCGA ACGGAATATT CCTGATAGCG 4920 

ACCTTCTTGG AATCGGTATG GGAAGTTGCT CATCATACTT TCCTTGTAAA TCATAGGGGC 4980 

TATAAACTCT CCGTCTACTT GTCCTGCAAC AATTGAAGTC TGCTCAAAAC GCCGTCCGCT 5040 

AATCTTTTCA TAGACTTTCT CCCTTTTAGG AGCCTAGCTT TCTAGTTTGT TCTTTGATTT 5100 

TTATTGAGTA TACCACTATT TTACTCCCTC TGGCAAGGGA CTTTGTCTAT GTGGAGGGAT 5160 

TGGGCTCCTA TGTGGTGGAG CTTTTCTGTT CTTTCTGAAA TATGGTATAA TAGCACTAAT 5220 

CAATTTCTAG GAAAATAGAT ACAGAAAGGG GCTGAAAGAT GTCTCATATT ATTGAATTGC 5280 

CAGAGATGCT GGCAAACCAA ATCGCGGCTG GAGAGGTCAT TGAACGTCCT GCCAGTGTGG 5340 

TCAAAGAGTT GGTAGAAAAT GCCATTGACG CGGGCTCTAG TCAGATTATC ATTGAGATTG 5400 
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AGGAAGCTGG TCTCAAGAAG GTTCAAATCA CGGATAACGG TCATGGAATT GCCCACGATG 5460 

AGGTGGAGTT GGCCCTGCGT CGCCATGCGA CCAGTAAGAT AAAAAATCAA GCAGATCTCT 5520 

TTCGGATTCG GACGCTTGGT TTTCGTGGTG AAGCCTTGCC TTCTATTGCG TCTGTTAGTG 5580 

TCTTGACTCT GTTAACGGCG GTGGATGGTG CTAGTCATGG AACCAAGTTA GTCGCGCGTG 5640 

GGGGTGAAGT TGAGGAAGTC ATCCCAGCGA CTAGTCCTGT GGGAACCAAG GTTTGTGTGG 5700 

AGGATCTCTT TTTCAACACG CCTGCCCGTC TCAAGTATAT GAAGAGCCAG CAAGCGGAGT 5760 

TGTCTCATAT CATTGATATT GTCAACCGTC TGGGCTTGGC CCATCCTGAG ATTTCTTTTA 5820 

GCTTGATTAG TGATGGCAAG GAAATGACGC GGACAGCAGG GACTGGTCAA TTGCGCCAAG 5880 

CAATCGCAGG GATTTACGGT TTGGTCAGTG CCAAGAAGAT GATTGAAATT GAGAACTCTG 5940 

ACCTAGATTT CGAAATTTCA GGTTTTGTGT CCTTGCCTGA GTTGACTCGG GCTAACCGCA 6000 

ATTATATCAG CCTCTTCATC AATGGCCGTT ATATTAAGAA CTTCCTGCTC AATCGTGCTA 6060 

TTTTGGATGG TTTTGGAAGC AAGCTTATGG TTGGACGTTT TCCACTGGCT GTCATTCACA 6120 

TCCATATCGA CCCTTATCTA GCGGATGTCA ATGTGCATCC AACTAAGCAA GAGGTGCGGA 6180 

TTTCCAAGGA AAAAGAACTG ATGACTCTGG TTTCAGAAGC TATTGCAAAT AGTCTCAAGG 6240 

AACAAACCTT GATTCCAGAT GCCTTGGAAA ATCTTGCCAA ATCGACCGTG CGCAATCGTG 6300 

AGAAGGTGGA GCAAACTATT CTCCCACTCA AAGAAAATAC GCTCTACTAT GAGAAAACTG 6360 

AGCCGTCAAG ACCTAGTCAA ACTGAAGTAG CTGATTATCA GGTAGAATTG ACTGATGAAG ,6420 

GGCAGGATTT GACCCTGTTT GCCAAGGAAA CCTTGGACCG ATTGACCAAG CCAGCAAAAC 6480 

TGCATTTTGC AGAGAGAAAG CCTGCTAACT ACGACCAGCT AGACCATCCA GAGTTAGATC 6540 

TTGCTAGCAT CGATAAGGCT TATGACAAAC TGGAGCGAGA AGAAGCATCC AGCTTCCCAG 6600 

AGTTGGAGTT TTTCGGACAA ATGCACGGGA CTTATCTCTT TGCCCAAGGG CGAGATGGAC 6660 

TTTACATCAT AGATCAGCAC GCTGCTCAGG AACGGGTCAA GTACGAGGAG TACCGTGAAA 6720 

GCATTGGCAA TGTTGACCAA AGCCAGCAGC AACTCCTAGT GCCCTATATC TTTGAATTTC 6780 

CTGCGGATGA TGCCCTGCGT CTCAAGGAAA GAATGCCTCT CTTAGAGGAA GTGGGCGTCT 6840 

TTCTAGCAGA GTACGGAGAA AATCAATTTA TTCTACGTGA ACATCCTATT TGGATGGCAG 6900 

AAGAAGAGAT TGAATCAGGC ATCTATGAGA TGTGCGACAT GCTCCTTTTG ACCAAGGAAG 6960 

TTTCTATCAA GAAATACCGA GCAGAGCTGG CTATCATGAT GTCTTGCAAG CGATCTATCA 7020 

AGGCCAATCA TCGTATTGAT GATCATTCAG CTAGACAACT CCTCTATCAG CTTTCTCAAT 7080 

GTGACAATCC CTATAACTGT CCTCACGGAC GTCCTGTTTT GGTGCATTTT ACCAAGTCGG 7140 
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ATATGGAAAA GATGTTCCGA CGTATTCAGG AAAATCACAC CAGTCTCCGT GAGTTGGGGA 7200 

AATATTAAAA GTATAAAAAA GTCTGGGAAA AATTTTCAAA ATCAAAAAAA CGCATAAAAT 7260 

CAGGTGTTCA AAAACCTTGA TTTTATGCGT TTTATCATGG AAATAGTTAC TTCATTTTTT 7320 

CCTAATTCTT TTCGAAACTC TTTTTAAACG ACGTCAGTTT TATCAGTAAT CTCAAAACAG 7380 

TGTTTTGAGC TAATTTTGCC AGTTTTGTCT GTAACATCGA AGTTGTGTTT TACCACTCTG 7440 

CGACTGGTTT CCTAGTTTGC TCTATGATTT TCACAGAGCA TTAAATTGCG ATTTTGCCAA 7500 

GTTTCTTTAT TCGTCTAAAA GTAGAGTCTG TTCTATGCGT CTAATGTACG AATCAGGTTG 7560 

ACCATTTCAA TAGCTCCTTG TGCACACTCA GAACCCTTAT TTCCTGCTTT AGTACCAGCT 7620 

CGTTCTATGG CTTGTTCAAT TGTATCTGTC GTTAGCACAC CAAACATAAC AGGAATTTCG 7680 

CTATTTAAAC TGATTTGGGC GATTCCCTTA GATACCTCGC TACATACATA ATCATAATGA 7740 

CTTGTATTCC CTCTAATGAC AGCTCCCAAG CAGATAATTG CATCATATTT TTTACTTTTT 7800 

GCCATTTTTG ATGCAATCAG TGGTATTTCA AAAGCTCCTG GAACCCAGGC TACCTCTATA 7860 

TCTTTCTCGT TTACATTCTC TCTTTTGAGA TTATCTAGTG CTCCAGATAA TAATTTTGAA 7920 

GTTATAAATT CATTAAATCT CGCTACAACA ATACCTATTT TAATATTGTT TGCTACTAAA 7980 

TTACCTTCAT AAGTGTTCAT TTATTTTTCC TCCATATTTA AAATGTGACC CATTCGATTT 8040 

TTCTTTGTTT CTAAATAAAA ACTATCGTAA GGATTGGCTT CTATTTCGAT TGATATTCTA 8100 

CTGGAAATGG TAATTCCATA TTTTTCTAAC TGTTCAACCT TGTCAGGATT ATTTGTCAGT 8160 

AAATGAAGTG ACTGAAGTCC CAGATCTTTA AGCATTTTTG CTCCAATATG ATATTCTCTT 8220 

AAATCACCTT CAAAGCCTAA TGCAAGATTG GCATCAAGCG TATCCATGCC TTGATCTTGT 8280 

AAATGATAGG CTTTTAATTT ATTGATAAGT CCAATTCCTC GTCCCTCCTG TCGCAAGTAA 8340 

AGTAAGACAC CCGAACCATT CTCAACAATC ATTTTCATAG CCTTATCGAA TTGCTGTCCA 8400 

CAATCGCAAC GTAAAGAGCC TAAAACATCT CCTGTTAAAC ATTCGGAGTG GACCCGACAT 8460 

AATACATTGG CTTCATCCTC TATATTTCCC ATAATAAGAG CAAGATGATG TTCCCCATTT 8520 

AGTTTATCTA TATAGCTAAT TGCTTTGAAA TTACCGTATC TAGTAGGCAT ATTGACAGTT 8580 

GAAACTCGTT CTACCAGCTG ATCATATACT TTTCTATATT CTTGTAATTC TTTGATGGTA 8640 

ATTAGTGGAA TGTTGTGTTT TTTCGAGAAC TGAATTAAAT CATCTGTTCT CATCATTTTG 8700 

CCATCATGAT TCATTATTTC ACAACATAGG CCACACTCTT TTAGTCCAGC TAATTTTAAT 8760 

AAATCAACAG TTGCTTCTGT GTGTCCATTT CTTTCTAGGA CACCACCTTT TTTTGCAATT 8820 

AAAGGAAACA TGTGTCCTGG CCTGCGAAAA TCAGAGGGTG TTATATCTTC AG CT AC AC AC 8880 

ATACGTGCGG TCAGTCCTCT TTCCTCGGCA GAAATACCTG TGGTCGTTTC TTTATAATCA 8940 
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ATTGAAACTG TAAAAGCAGT CTTATGATTA TCTGTATTGT TTTCAACCAT AGGTGAAAGC 9000 

ATTAATTGAT TAGCTAAACT TTCGCTCATA GGCATACAAA TTAATCCTTT GGCATAAGTA 9060 

GCCATAAAAT TAACATTTTC TGTTGTAGCT GCTTGTGCAG AACAAATTAA GTCTCCTTCA 9120 

TTTTCTCTAT CCTTGTCGTC TATAACAAGA ACAAGTCGTC CCTTCTGCAA TGCTTCTAAT 9180 

GCTTCTTGTA TTTTTCGATA TTCCATTGAC TGATTATCCT TTCTGCTAAA ATCCATTTTG 9240 

ATATAATAGT TCCTTAGATA TTTCTGATTT TGGAGAGTTA TCCATCAGTT TTTGCACATA 9300 

TTTACCTAAG ATATCATTTT CAAGATTTAC TGTACTCCCG ACTTGTTTAC TCTTAAGAAT 9360 

GGTTTGTTCC AAGGTATGAG GGATAACAGA TACTGAAAAG TTTACTTTGG AGACTTTAGC 9420 

GACAGTCAGA CTAATGCCGT CAATTGTAAT AGATCCTTTT TCAACTATTA AATCTAAAAT 9480 

TTCTTTTTGT GTGTTGATTT GATACCATAC AGCATTATCA TCTTTTTTTA TTGACGAGAT 9540 

TTTTCCTGTA CCATCAATGT GTCCTGTAAC GACGTGACCC CCAAGTCGAC CGTTGACAGA 9600 

TAAGGCTCTT TCTAGATTCA CCTCACTTCC ATGTTTTAAT AGAGTAAGAG CTGTTCGACT 9660 

CCATGTTTCA TTCATTACAT CAACTGTAAA GGATTGATGA TTGAAATGAG TAACTGTAAG 9720 

ACAGATACCA TTTACTGCTA TACTATCGCC TAAATGGATA TCCGTTAATA TTTTTGAGGC 9780 

TTTAATTGAT AGTTTACAAT TACGAGAGTC TTTCTGTATT CTTTCAACTT TTCCGATTTC 9840 

TTCAATTATT CCTGTGAACA TGGATAAATC ACTTCACTTT CTATGAGATA GTCATTTCCT 9900 

ATTTGAGAAA ATGCATAAGG TTTCAATCTA ATAGCGTCAT TTGGCAAAGA AATACCTTCA 9960 

CCTCCGACAG GAAACTTGGC ACTACCTCCA AAAACTTTTG GTGCAATATA TATTTTCAGC 10020 

TCATCAACAA TTTGTTGTTC CAAAGCACTC CAATTCATTA GACTGCCCCC TTCTAGAACT 10080 

AGGCTATCAA TCTGCATGTT TCCTAGATGT TGCATTAAAC TCGATAAGTC TATATGATTG 10140 

CCTTTTTTCT TTATGGAAAG TATTTCACAG CCATGATTTT GATATAGCTT CATTTTATTT 10200 

TTGTCTTCAG AGGAAGTGGC AATGTAAGTT TTAATATCAT TTGCTGTTTT TACGATTTTA 102 60 

GAGGTAAGAG GAGTTCGTAA ATGTGTATCG CATATGATAC GGATAGGATT TTTCCCTTCC 10320 

TCCAATCTAC ATGTCAGCAA AGGATCGTCT TGAATAACAG TATTGACTCC CACCATAATT 10380 

GCACTAACAT GGTGTCGTAA CTGATGCACA TGCTTTCTTG CTTCTTCTTC AGTAATCCAT 10440 

TTGGATTGAT TTGTTTTAGT GGCTATTTTT CCATCCATTG ACATTGCATA TTTCATAAAA 10500 

ACATAGGGTA CATGCTGGGT AATATACTTT CTAAAACTTT TTATTAAGTT AAGACACTCA 10560 

TTTTCTAAAA TTCCAACAGT AACTTGAAGA TTATTTTCCT CAAGTATCTT TACTCCTTTT 10620 

CCAGATACAA TAGGATTACA GTCTAGGCTT CCAATGACTA CTCTTGTAAT ACCACTATCG 10680 
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ATTATAGCAT CTATACAGGG AGGTGTTTTC CCGAAGTGAC AACAGGGTTC AAGTGTTACA 10740 

TAAAGCGTCG CTCCGACAGG GGATTCTCTA CAGTTTTTAA GAGCATTTCT CTCAGCATGT 10800 

GGGCCACCAA AAAACTCATG ATAACCTTGT CCGATAATGT GATTATCTTT TACAATAACT 10860 

GCGCCGACCA TAGGATTGGG ATTGACGTAA CCAGCCCCTT TTTGTGCCAG TTTTATTGCT 10920 

AATTTCATAT ATTTTGAATC GCTCATCTCG CTACCTCCAA AAAAATATAC CTTGAATAGG 10980 

GGACTACTCA AGGCATACAA AAGAAAACTT ATGCGATTAA CAAAAATGCT CTGAAATGAC 11040 

AAGTAATCAT TTCAGAGCAC GCAAAAAGCA CAAATATACT TTTATCTTCT TTCATCCAGA 11100 

CTATACTGTC GGCTTTGGAA TTTCACCAAA TCATGCCTTT CGGCTCGTGG GCTATACCAC 11160 

CGGTAGGGAA TTTCACCCTG CCCTGAAGAT AGTTATTCAA TTACAGATGA TTATAGTACT 11220 

TAATTTTGAA TATGTCAACA GATAAATACC GATTGTTTTT GATATACTGT ATTTGTGATA 11280 

ATCGATTCTC GCTCCTCGGA TAAAGAAAAT ATGATATACT AGATAAACGA AATAAGAGAG 11340 

AAGGAATACT ATGTACGCAT ATTTAAAAGG AATCATTACC AAAATTACTG CCAAATACAT 11400 

TGTTCTTGAA ACCAATGGTA TTGGTTATAT CCTGCATGTG GCCAATCCTT ATGCCTATTC 11460 

AGGTCAGGTT AATCAGGAGG CTCAGATTTA TGTGCATCAG GTTGTGCGTG AGGACGCCCA 11520 

TTTGCTTTAT GGATTTCGCT CAGAGGATGA GAAAAAGCTC TTTCTTAGTC TGATTTCGGT 11580 

CTCTGGGATT GGTCCTGTAT CAGCTCTTGC TATTATCGCT GCTGATGACA ATGCTGGCTT 11640 

GGTTCAAGCC ATTGAAACCA AGAACATCAC CTACTTGACC AAGTTCCCTA AAATTGGCAA 11700 

GAAAACAGCC CAGCAGATGG TGCTGGACTT GGAAGGCAAG GTAGTAGTTG CAGGAGATGA 11760 

CCTTCCTGCC AAGGTCGCAG TGCAAGCAAG TGCTGAAAAC CAAGAATTGG AAGAAGCTAT 11820 

GGAAGCCATG TTGGCTCTGG GCTACAAGGC AACAGAGCTC AAGAAAATCA AGAAATTCTT 11880 

TGAAGGAACG ACAGATACAG CTGAGAACTA TATCAAGTCG GCCCTTAAAA TGTTGGTCAA 11940 

ATAGGAGCAG AGAATGACAA AACGTTGTTC GTGGGTCAAG ATGACCAACC CGCTCTACAT 12000 

CGCCTATCAT GATGAGGAGT GGGGCCAGCC CCTCCATGAT GACCAAGTAT TGTTTGAG1T 12060 

GTTGTGTATG GAAACCTATC AGGCAGGCCT GTCTTGGGAA ACGGTACTCA ACAAACGCCA i2120 

AGCTTTCCGA GAAGTCTTTC ATAGCTATCA AATTCACTCA GTCGCAGAGA TGACTGACAC 12180 

TGAATTGGAA GCCATGCTGG AGAATCCAGC TATCATTCGA AATAGAGCCA AGCTTTTTGC 12240 

TACACGCGCT AACGCCCAAG CCTTTCTACA GTTACAGGCA GAGTACGGCT CTTTTGATGC 12300 

CTATCTTTGG TCTTTTGTTG AGGGGAAAAC TGTCGTTAAC GATGTTCCTG ATTATCGCCA 12360 

AGCGCCAGCT AAAACACCCT TATCTGAGAA ATTAGCCAAA GATCTCAAAA AACGAGGCTT 12420 

CAAGTTCACA GGCCCAGTCG CCGTATTGTC TTTTCTACAG GCTGCAGGGC TAGTTGATGA 12480 
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CCACGAGAAT GATTGTGAGT GGAAAGGTCT TAAATGATGT CTAACAAAAA TAAGGAAATT 12540 

CTGATTTTTG CGATTCTCTA TACAGTCCTC TTTATGTTTG ATGGCGTTAA ATTGCTGGCT 12600 

TCTTTAATGC CATCTGCCAT TGCAAATTAT CTTGTTTATG TAGTTTTAGC TCTATATGGC 12660 

TCCTTCTTGT TCAAGGATAG ATTGATCCAA CAATGGAAGG AG ATT AG AAA GACTAAAAGA 12720 

AAATTCTTCT TTGGAGTCTT AACAGGATGG CTCTTTCTCA TTCTGATGAC TGTTGTCTTT 12780 

GAATTTGTAT CAGAGATGTT GAAGCAGTTT GTGGGACTAG ATGGACAAGG TCTAAATCAG 12840 

TCTAATATTC AAAGTACCTT TCAAGAACAA CCACTACTGA TAGCTGTTTT TGCTTGTGTC 12900 

ATTGGACCTC TGGTAGAAGA ATTATTTTTC CGTCAGGTCT TATTGCATTA CTTGCAGGAA 12960 

CGGTTGTCAG GTTTACTAAG CATTATTCTG GTAGGACTTG TTTTTGCTCT GACTCATATG 13020 

CACAGTTTGG CTCTATCAGA GTGGATTGGT GCAGTTGGTT ACTTAGGTGG AGGCCTTGCC 13080 

TTTTCTATTA TTTATGTGAA AGAAAAAGAG AATATCTACT ATCCCCTACT TGTTCACATG 13140 

TTAAGCAACA GCCTCTCCTT AATCATTTTA GCTATCAGTA TAGTAAAATG AAATGAGAAC 13200 

AGGACAAATC GATTTCTAAC AATGTTTTAG AAGTAGAGGT GTACTATTCT AGTTTCAATA 13260 

TACTGTAATA TGTGATGAAA ATGCCAGTAA TGATACCGAG AAAAAAGCTG AGAAACTTTT 13320 

CCCAGCTTTA TTTGTTATAG TCAAAGAGAA TGACTTGTTC CTGTGCATCT ACATGAGCAT 13380 

GGACCCCAAA GGGTACAATT GCTCTTGGAG TTGCGTGGCC GACATTCAGA TTATAGACAA 13440 

TCGGGATATT GCTGTCAATG ATATCCAATA GTGCCTCTTT ATAGTCGTCA TGGAAAGTTT 13500 

CATCCATAGG TTTTCCGACC AAGAGTCCAT TGATGACCGC GAATATGCCA GTGTCCTTTA 13560 

AAGTTAGCAA CATCTTTTTG AAGTCTTCTG GCTTAGGCTT TTCTTCGCTT GTTTCGAGCA 13620 

AGAGGATTTT CCCTTCCCAG TCTGACAAGT CAGGGAAAAG TTTGTATTTT TGGCAGAGTT 13680 

CCGTGCTATC TGCGTATCGA GAGTTGTCAA AGATATCGTA GAGGGATTCG AGGCAACCAC 13740 

CGAGGATTTT CCCCTCGAAC TGGGCACTTC CTTGCAACAA GTCAAAACCT GTATTTGTAT 13800 

GACTGACACG AGGTGTTCCC AGGGCCGTGG GACTAAAATC AGTTCGTTCC TCATACCAAA 13860 

CGTCACTAGG GCGGATTTCT GAAATTCTTC CCGTCTCAAT CAATTCTTTA AAGTAGTGAA 13920 

GGCTATAGGC TAGCATTTCT TTGTCTAATT CACAAATGTC TGCTAAAAAG GATTGACCAT 13980 

AAAAAGTCTT GATTCCTAAT TTATGCAACA TGAGGTGGTT CATGGTTGTA TCCGAGAAGC 14040 

CAAGAAAAAT TTTTTGCTTG ATAACCTTTT GGAGTTGGTC ATTTTCAAAA AGATAAGGTA 14100 

GCAAGCGATA GGTATCGTCT CCACCGATGG CACATAGGAT CATGTCGATG CTATCATCAG 14160 

AAAAGGCATG AATCAAATCC TCTGCACGAG CTTCAGGATG GTCCTTGATA AAGTCTAATC 14220 
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CTTTTAACGA ATGGGGCAAA AAGATGGGAT TGGTCCCAGA TCCTTGAGAC GTT 14273 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9828 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

GTGAAGTGCG GCAAAAGGTG CAAGTGATGA GCTCAGGTTC TTTAGCTCTT GACATTGCCC 60 

TTGGCTCAGG TGGTTATCCT AAGGGACGTA TCATCGAAAT CTATGGCCCA GAGTCATCTG 120 

GTAAGACAAC GGTTGCCCTT CATGCAGTTG CACAAGCGCA AAAAGAAGGT GGGATTGCTG 180 

CCTTTATCGA TGCGGAACAT GCCCTTGATC CAGCTTATGC TGCGGCCCTT GGTGTCAATA 240 

TTGACGAATT GCTCTTGTCT CAACCAGACT CAGGAGAGCA AGGTCTTGAG ATTGCGGGAA 300 

AATTGATTGA CTCAGGTGCA GTTGATCTTG TCGTAGTCGA CTCAGTTGCT GCCCTTGTTC 360 

CTCGTGCGGA AATTGATGGA GATATCGGAG ATAGCCATGT TGGTTTGCAG GCTCGTATGA 420 

TGAGCCAGGC CATGCGTAAA CTTGGCGCCT CTATCAATAA AACCAAAACA ATTGCCATTT 480 

TTATCAACCA ATTGCGTGAA AAAGTTGGAG TGATGTTTGG AAATCCAGAA ACAACACCGG 540 

GCGGACGTGC TTTGAAATTC TATGCTTCAG TCCGCTTGGA TGTTCGTGGT AATACACAAA 600 

TTAAGGGAAC TGGTGACCAA AAAGAAACCA ATGTCGGTAA AGAAACTAAG ATTAAGGTTG 660 

TAAAAAATAA GGTAGCTCCA CCGTTTAAGG AAGCCGTAGT TGAAATTATG TACGGAGAAG 720 

GAATTTCTAA GACTGGTGAG CTTTTGAAGA TTGCAAGCGA TTTGGATATT ATCAAAAAAG 780 

CAGGGGCTTG GTATTCTTAC AAAGATGAAA AAATTGGGCA AGGTTCTGAG AATGCTAAGA 840 

AATACTTGGC AGAGCACCCA GAAATCTTTG ATGAAATTGA TAAGCAAGTC CGTTCTAAAT 900 

TTGGCTTGAT TGATGGAGAA GAAGTTTCAG AACAAGATAC TGAAAACAAA AAAGATGAGC 960 

CAAAGAAAGA AGAAGCAGTG AATGAAGAAG TTCCGCTTGA CTTAGGCGAT GAACTTGAAA 1020 

TCGAAATTGA AGAATAAGCT GTTAAAGCAG TGGAGAAATC CGCTACTTTT TCGATTTTTG 1080 

ATTCAAGTTT TTAGATTATA TATAGTAGCT TGAAATAAGA TATGAACAAC TCTATTAGGA 1140 

AAGTCAAATT AATTTCTAGA AATGTTTTAG CAGCTACAGC GTACTATTCC AAACTCAACC 1200 

AACTATAATA GATCGAAACT AGAATAGTAC ATATCTACTT CTAAAACATT GTTAAAAATC 1260 

GATTTGACTT TCCTTATTTC ATTCCGCTAT ATATAGTTTG CTGTTTCTTG TCGCTCCTCT 1320 

GGAAAGCTGA TATAATAGCT TTATGAATAA AAAACGAACA GTGGACCTGA TACATGGTCC 1380 
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GATTCTTCCC TCGCTCTTAA GCTTCACCTT TCCAATTTTG CTATCAAATA TTTTTCAACA 1440 

GCTCTATAAC ACTGCTGATG TCTTGATTGT TGGACGATTT CTTGGTCAAG AATCCTTGGC 1500 

TGCAGTAGGA GCGACGACAG CGATTTTTGA CCTGATTGTA GGTTTTACAC TTGGTGTTGG 1560 

CAATGGCATG GGGATTGTCA TTGCTCGTTA TTATGGGGCT CGGAATTTCA CTAAAATCAA 1620 

GGAAGCAGTA GCAGCCACCT GGATTTTAGG TGCTCTTTTG AGCATTCTAG TTATGTTGCT 1680 

GGGCTTTCTT GGCTTGTATC CTCTCTTGCA ATACTTAGAT ACTCCTGCAG AAATTCTTCC 1740 

TCAATCTTAT CAATATATTT CTATGATTGT GACCTGTGTA GGTGTCAGCT TTGCTTATAA 1800 

TCTTTTTGCA GGCTTGTTGC GGTCTATTGG TGACAGTCTA GCAGCCCTGG GATTTCTGAT 1860 

TTTCTCTGCC TTGGTTAATG TGGTTCTGGA TCTCTATTTT ATTACGCAAT TGCATCTGGG 1920 

AGTTCAATCC GCAGGACTTG CTACCATTAT TTCGCAAGGT TTATCAGCGG TTCTCTGCTT 1980 

TTATTATATT CGTAAAAGTG TGCCAGAACT CTTGCCACAG TTTAAACATT TCAAATGGGA 2040 

CAAAAGCTTG TACGCGGATC TCTTGGAGCA AGGTTTGGCT ATGGGCTTGA TGAGTTCAAT 2100 

TGTATCTATC GGCAGTGTGA TTTTACAGTT TTCTGTTAAT ACATTTGGTG CAGTGATTAT 2160 

TAGTGCCCAG ACGGCAGCTC GACGCATTAT GACCTTTGCC CTTCTTCCTA TGACCGCTAT 2220 

TTCTGCATCA ATGACGACCT TTGCTTCTCA GAATCTAGGA GCTAAGCGAC CTGACCGTAT 2280 

TGTTCAAGGT CTTCGAATCG GCAGTCGTTT AAGTATATCC TGGGCAGTTT TTGTTTGTAT 2340 

TTTCCTCTTT TTTGCCAGTC CAGCTTTGGT TTCCTTCTTG GCTAGTTCGA CAGATGGTTA 2400 

CTTGATAGAA AATGGAAGTC TCTATCTGCA AATCAGTTCA ACCTTTTATC CCATTTTGAG 2460 

CCTCTTGTTG ATTTATCGCA ATTGCTTGCA GGGCTTGGGG CAAAAGATCC TTCCTCTAGT 2520 

TTCTAGCTTT ATTGAACTAA TCGGAAAAAT CGTTTTTGTG GTTTTGATTA TTCCTTGGGC 2580 

AGGATATAAG GGTGTTATCC TTTGTGAACC TCTTATCTGG GTTGCCATGA CAGTTCAACT 2640 

GTACTTCTCA TTATTCCGTC ATCCCTTGAT AAAAGAAGGC AAGGCAATCT TGGCAACCAA 2700 

AGTGCAATCC TAGTTGGATT TACTGAATAA AATCCATTTC CTCTAGTGAA AATCGAAAAA 2760 

ACTTGTGTTC TCTTCTTTAG TTTGGTGTTG AAAATAGTTT AACAGACTTT TGACTTCTTT 2820 

TATATGATAT AATAAAGTAT AGTATTTATG AAAAGGACAT ATAGAGACTG TAAAAATATA 2880 

CTTTTGAAAA TCTTTTTAGT CTGGGGTGTT ATTGTAGATA GAATGCAGAC CTTGTCAGTC 2940 

CTATTTACAG TGTCAAAATA GTGCGTTTTG AAGTTCTATC TACAAGCCTA ATCGTGACTA 3000 

AGATTGTCTT CTTTGTAAGG TAGAAATAAA GGAGTTTCTG GTTCTGGATT GTAAAAAATG 3060 

AGTTGTTTTA ATTGATAAGG AGTAGAATAT GGAAATTAAT GTGAGTAAAT TAAGAACAGA 3120 
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TTTGCCTCAA GTCGGCGTGC AACCATATAG GCAAGTACAC GCACACTCAA CTGGGAATCC 3180 

GCATTCAACC GTACAGAATG AAGCGGATTA TCACTGGCGG AAAGACCCAG AATTAGGTTT 3240 

TTTCTCGCAC ATTGTTGGGA ACGGTTGCAT CATGCAGGTA GGACCTGTTG ATAATGGTGC 3300 

CTGGGACGTT GGGGGCGGTT GGAATGCTGA GACCTATGCA GCGGTTGAAC TGATTGAAAG 3360 

CCATTCAACC AAAGAAGAGT TCATGACGGA CTACCGCCTT TATATCGAAC TCTTACGCAA 3420 

TCTAGCAGAT GAAGCAGGTT TGCCGAAAAC GCTTGATACA GGGAGTTTAG CTGGAATTAA 3480 

AACGCACGAG TATTGCACGA ATAACCAACC AAACAACCAC TCAGACCACG TTGACCCTTA 3540 

TCCATATCTT GCTAAATGGG GCATTAGCCG TGAGCAGTTT AAGCATGATA TTGAGAACGG 3600 

CTTGACGATT GAAACAGGCT GGCAGAAGAA TGACACTGGC TACTGGTACG TACATTCAGA 3660 

CGGCTCTTAT CCAAAAGACA AGTTTGAGAA AATCAATGGC ACTTGGTACT ACTTTGACAG 3720 

TTCAGGCTAT ATGCTTGCAG ACCGCTGGAG GAAGCACACA GACGGCAACT GGTACTGGTT 3780 

CGACAACTCA GGCGAAATGG CTACAGGCTG GAAGAAAATC GCTGATAAGT GGTACTATTT 3840 

CAACGAAGAA GGTGCCATGA AGACAGGCTG GGTCAAGTAC AAGGACACTT GGTACTACTT 3900 

AGACGCTAAA GAAGGCGCCA TGGTATCAAA TGCCTTTATC CAGTCAGCGG ACGGAACAGG 3960 

CTGGTACTAC CTCAAACCAG ACGGAACACT GGCAGACAAG CCAGAATTCA CAGTAGAGCC 4020 

AGATGGCTTG ATTACAGTAA AATAATAATG GAATGTCTTT CAAATCAGAA CAGCGCATAT 4080 

TATTAGGTCT TGAAAAAGCT TAATAGTATG CGTTTTCTTG TGGAGATATT TCCTTCAATT 4140 

TTGCTACTAT ATTAAACAAA AATCAAAAAG CAAACTAGAA AGTTATGCTC AAATAAAATC 4200 

TAAATTTGAC AATGTAAACC GAGTCGGATA GCTTTAAGTA CTGTTTTGAG GTTGAAGATA 4260 

CGATTTTTGA TAGGAACTCA TCAATTTTAG ATTTTTAAGC AGCATCAATA AATTGCTTCC 4320 

TTGTTTTGTC ATAATTTTTT TATTTAAAAA ATTATGACma GAGTGTGCTA TTCTTTTTAT 4380 

GAGAGGTGTA TGAATATGAT AAATGTATGT GATAAATGTA TGTGATGTTG GAAAAAGAAT 4440 

AAAAGAACTT AGAATATCTT CAAATCTTAC TCAAGATAAG ATTGCTGAGT ATTTGTCTTT 4500 

GAATCAAAGC ATGATTGCCA AAATGGAAAA AGGTGAAAGG AATATCACGA ATGGATTTAA 4560 

GTAATAAAGC TTCAAATCTT AGAAAAAAGT TGGGAGCTGA TGGTGAATCG CCGATAGATA 4620 

TTTTTAAATT GGTACAAAAG ATAGAAAATT TGACGCTGGT ATTTTATGGA CTCGGAAAGA 4680 

ATATTAGCGG AGTCTGTTAT AAAGGAACTC AGTTCAGTCT CATTGCAGTC AATTCAGACA 4740 

TGCCATTAGG AAGGTAAAGA TTTTCTTTAG CACATGGACT GTATCATCTT TATTATGATG 4800 

AGGTGAAGAA GAGTTCAGTC AGTCTTATCT TGATTGGTGA AGGAGATGAA ACTGAAAGAA 4860 

AAGCGGATCA GTTTGCTTCT TATTTTTTAA TTTTCCCATC TTCACTGTAT AGGATGGTTG 4920 
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AGGAAATCAG AGAAAATGCC AATAGAACTC ATCTTGAAGT AGAAGATATT ATAAAATTGG 4980 

GTCAGTTTTA TGGTATCAGT CATAAAGCTA TGTTATATAG ATTGAGGAAT GATGGATACC 5040 

TTGATGCAGA AGAAATTAAA AATATGGATA TTAGTGTTAT AGAGACAGCT TCAAGATTAG 5100 

GCTATGATAC AAGTTTATAT CGTCCTTTGT CAGAAAGTAA AAAAGAAATG GCATTAGGAT 5160 

AATATATTAA TTCAACTGAA CAACTTTTAG AAAATAACAG AATTTCGCAA GGGAAGTATG 5220 

AGGAACTGTT ACTAGATGCT TTCAGATATG ATATTGTATA TGGGCTAGAT GAAGAGGGGG 5280 

GAGTTGTCGT TTGACTAGTC GTGTATTTAT TGATGCAGAT TGTATTTCAG TATTTTTATG 5340 

GGTTGGCACT GAACATCTTT TAGAAAAGCT CTATTTGGGT AAAATTGTTA TTCCACAAGA 5400 

GGTGTATGAT GAAATCAATA TACCTACAAT TCCCCATTTA AAATCTAGGA TAGATCAGTT 5460 

GGTAGCTAAG GGTTCAGCTG AGATTGTGAG CATAGACATT GGAACTGAAG AATACGCATT 5520 

ATATAGAGAT TTAACAAGAA ATCATGATAG TAACAAGATT ATTGGTAAGG GAGAAGGGGC 5580 

ATCTATTTCC TTAGCGAAAA AGCATAATGG GATATTAGGA AGTAATAACC TAAGAGATGT 5640 

TAAATCATAT GTAGAAGAAT TTTCTTTAGA ATATATGACA ACAGGAGATA TACTGATTGA 5700 

AGCGTTTAAA GCGTAATTTA TTACTGAATA AGAGGGCAAT CATATCTGGA ATAATATGCT 5760 

TAAAAAGAGA AGGAAAATTG GTGCAAATTC ATTTTCAGAC TATCTTCGTG GAAGTATTCA 5820 

TCAAAATAGA CAAAAATAAA TTTGGATAAA TCGAACTCAC TATTCAGGAG GCATATGAGC 5880 

AATTCGAAAA AGAAAAGTGT CAAATTGAGC CTATAGGAGT AGAAGTGAAA TAGTAAGTCC 5940 

TGCATAGTGG ATGAGAGAAA AGTTCTCCTT GAAGTTTTCC TGAACTATCA GTCGCATGTC 6000 

AAACGATATG TAGGGTAATG TGAGAGGGGA TAGCGAGTAG TTTTTGGTTA TTTTATCAAA 6060 

AAACTTATAT TTTATTATAC CGAATGATAA AATATAATAA AAATGATAGA ATAAGGAAAA 6120 

AACATGAATG TCAAAAAGAT AATGTCAATT TTTCAATCCT TTTATGTTGA TGTCAGTATT 6180 

GAGGAACTGA CTTTGACTTT ACCAATCAGT TTTGTAAAAA GGTTTGAGTA TACTCAAATG 6240 

ACTTTTCATA AGGAATCATT TTTATTGATT AAAGAAAAGA GAAGGGGGAG TTTGAGTTCA 6300 

TTTGTTACTC AGGCTCGCAC TATGGGTGAA AAAGCCAATA TGGATGTTGT TTTGGTGTTT 6360 

TCGAAGTTAT CAGACAGTGA AAAAAAGCAA TTACTTCAAG CTAGAGTTCC GTTTGTAGAC 6420 

TTTAAGGGAA ACCTCTTCTT CCCTCCATTG GGACTAGTAC TCAATGCGAA TGATACTGAA 6480 

GTCCCTAAGG AATTAACACC TAGCGAACAA TTAACGTGGA TTGCCTTTTT ATTGACAAAA 6540 

GGTCAAAAAG TAGTAGATGT TGATTTGCTT TCACAAGTCA CTGGACTTCC AAACTCAACA 6600 

ATTTATAGGT GTTTGAGGAC TTTTAAAGCT TTATATTGGT TAAACAAGCA AAATAAGCTT 6660 
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TACACATATA CGGTGTCAAA GAAAGAATTA TTCTTAAAAT CCGTGTCATG TTTATTTAAT 6720 

CCCATCAAAA AACGGATTTT ATTGCCAGAT GGCGATATAA AGCAGATAAA ATCTGTTTCT 6780 

AACCTTCTAT ATGGTGGTGC TTATGCTTTG TCGCATTCAA CTTTTTTAGC TGAAACGGAT 6840 

GAAAATATTA GCTATGTCAT ATGGCAGAGA AAATTCAATC AGTTATCCTT GCCACTTTCT 6900 

CAGCATGTTT TAAAATGAAA GATGCTAGAG ATATGGAAAT ATCGTCCTTT TGTATCTGAG 6960 

TTTTGGAATG ATTTTAAAAA TAATCATGAT AAACAATTTG TAGATCCGAT TTCTCTTTAT 7020 

TTGACCTTAA AAGATGATGA TGACCCACGT ATAGAGGAAG AGAGTGAAGC ACTAGAAAAT 7080 

ATGATATTAC AGTATCTGGG AGAAGATGAT GCCAGCTAAT ACGAAAGTTA TTTTTCAAGA 7140 

AATGTTTGCG GATTTTCAGA ACTATTATGT TCTGATTGGG GGAACTGCTA CCTCTATCGT 7200 

ATTGGATTCG CAAGGATTTA AAAGTCGCAC AACAAAAGAT TATGATATGG TCATCATTGA 7260 

TGAAGTAAAA AATAAGGAAT TTTATACTAC CTTGAATCAT TTTTTAGAAT TGGGAGAGTA 7320 

TCAAGGAAGT CAGAAAGATG AGAAAGCGCA GCTTTTTCGA TTTACAACAA CTAATCCTGA 7380 

GTTTCCTTCT ATGATTGAAC TATTTAGTAT CTTACCAGAA' TATCCATTAA AGAAGGACGG 7440 

TCGAGAAATT CCCTTACATT TTGACCAAGA TGGTAGTTTA TCAGCCTTAT TATTGGATGA 7500 

AGATTATTAT AATATATTGG TGCATGAAAA AGAAACCATT CAGGGGTATT CGGTATTGAG 7560 

TAATTGTGGT TTATACTCTT CGAAAATCTC TTCAAACCAC GTCAGCTTCC ATCTACAACC 7620 

TCAAAACAGT GTTTTGAGCA GCCTGCAGCT AGCTTCCTAG TTTGCTCTTT GATTTTCATT 7680 

GAGTATTAAT TATTTTTAAG GCTAAAGCTT GGCTGGATAT GAGGGAGCGC TCTGCCACAG 7740 

GTGCTCAAGG TTTAAGTAAG TCCATTAAAA AGCATTTGAA TGACCTTACC CGTTTGACAG 7800 

CTTCCTTGCT AGGAGATGAA AAGTTATCGG CTATAACATC AAGTAGTGCG GTAAAAGCAG 7860 

ACATGCACCG CTTTGTGATA GAATTAGAGC CTGTGAAGTC AACTATTCTT CAAAATAATG 7920 

ACATTTCATT GGATCAAAAT GAAATTTTTG AAATTCTGAA AAATTTTCTC GATGGTTAAA 7980 

ATAATTGTAG CGAGATGGCT ATATTGAATT CGTCTATATC TGGAAACTAG AAAAAACTTC 8040 

AATTTCAGGA GAAAATGAAG TCAATCTTCC CACAATCAAA CGTATAGTAT CAAGGTTTTT 8100 

CAAGACCTGA TATTATGCGT TTTTTGCTTT TCAAAACTTT TTGCCCAGTC TTCGTTTTTA 8160 

TCCTCTAGTC ACTTGATTTG TTTCAGGTGG TTTTTTAGTA TAGTAGAATG AAACGAGAAC 8220 

AGGACAAATT GATCAGGACA GTCAAATCGA TTTCTAACAA TGTTTTAGAA GCAGAAGTGT 8280 

ACTATTCTAG TTTCAATCTA CTATAGTTAA ATCTGCGGTC AAGTCTACTG GTGAATCTAT 8340 

GATTGTAATA CTCTTCCAAA ATCTCATCAA CCACGTCAGT CTTGCCTTGC AGTCTGTATC 8400 

TTACTGACCA AGCTAGTGAT GGATTTAGAA TAGGTGATTT GGAGCGTCCT ATTAGCTAGG 8460 
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AAATGCTGCT 


CATAGTCCTT 


TGCTGAGGCT 


AGGGTGTTTC 


AACATTCAAC 


ACTCAACTGG 


8520 


TTGATCTAGT 


TGATAGGAAG 


GGAGTTACTA 


TAAAATACTC 


AGGCTTCCAT 


CATATTTTTT 


8580 


GAAACGATTG 


TGTAATCAAA 


ATGTACCAAT 


ATTGTAGTAT 


TGGTACAGAA 


GATGTTGTGA 


8640 


ATGGATAAAT 


ATATCATAAC 


TGCTATCTCA AAAAGATTTC 


ATATGTCTGT GCATATATAA 1 


8700 


TAGACTTCCT 


GCAAAACTAG 


AATCCTAGTT 


CATGATTGAT 


AATACCAGCA 


ATCAAATTCA 


8760 


TTCGTAATCC 


AAAGCGTTTA 


CGATGATTTC 


GATAGGTTGT 


TGAAAACATT 


TTAAACGTTT 


8820 


CTACTTTGGC 


AAAGATGTTC 


TCAACCTTGC TTCTCTCCTT 


AGATAGCGCA 


TGGTTATAGG 


8880 


CTTTATCTTC 


AGCTGTTAGC 


GGCTTGAGTT 


TGCTGGATTT 


ACGTGGAGTT 


TGTGCTTGAG 


8940 


GACATATCTT 


CATGAGCCCT 


TGATAACCAC 


TGTCAGCCAA 


GATTTTACCA 


GCTTGTCCGA 


9000 


TATTTCTGCA 


ACTCATTTTG 


AACAACTTCA 


TATCATGACA 


ATAGTTCACA 


GTGATATCCA 


9060 


AAGAAACAAT 


TCTCCCTTGA 


CTTGTGACAA 


TCGCTTGAGC 


CTTCATAGCG 


TGAAATTTCT 


9120 


TTTTACCAGA 


ATCATTCGCT 


AATTCTTTTT 


TTAGGGCGAT 


TGATTTTTAC 


TTCCGTCGCA 


9180 


TCAATCATTA 


CCGTGTCCTC 


AGAACTAAGA 


GGAGTTCTTG 


AAATCGTAAC 


ACCACTTTGA 


9240 


ACAAGAGTTA 


CTTCAACCCA 


TTGGCTCCGA 


CGGATTAAGT 


TGCTTTCGTG 


AATACCAAAA 


9300 


TCAGCCGCAA 


TTTCTTCATA 


AGTGCGGTAT 


TCTAGGCTTA 


ATTTAGGTTT 


TCGTCCACCT 


9360 


TTTGCGTGTT 


TAAGTTGATA 


AGCTGTTTTT 


AATACAGCTA 


ACATCTCTTT 


AAAAGTCGTG 


9420 


CGCTGAACAC 


CAACAAGACG 


CTTAAATCGT 


GTATCAGTTA 


ATTGTTTACT 


TGCTTCATAA 


9480 


TTTCGCAGGG 


AGTCTATTGA 


CTCTTTGGTA 


GGTGTCAATG 


TTTTTTTCAT 


CTATCCCGAG 


9540 


AATTATTTTC 


CCGCCATTTG 


TATTTGCAAA 


TGCTGAGTAG 


GTTTCCCAGA 


AAGACTCTGG 


9600 


AAGATTGTTT 


TTAGCTTTTT 


TGTATTCTAA 


ATCAACCCCT 


TCAAATTTTA 


AGTCCATATT 


9660 


TTTCCTTTAC 


ATCTGTTTTT 


TGTGGTTCTG 


GTATTTGTTC 


AAGTTGAGTG 


ATAATATAGC 


9720 


GAATTGAATT 


TCGAGAGTTT 


TTACTCAGTT 


AATTTCTTTT 


TTAACCCACT 


TTAATTGCTT 


9780 


TTTTAACACG 


GGTTAAAAAA 


GAAATTAAAG 


TGGGTTAATT 


TTTCTTGA 




9828 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3369 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
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CCGCGAAAGA TATTTTTGAA CAAGAGTTTG GACGTGAGGT CCGTGGCTAT AATAAAGTAG 60 

AAGTTGACGA GTTTTTAGAC GATGTCATCA AGGACTATGA AACCTATGCT GCCTTGGTCA 120 

AGTCACTTCG TCAGGAAATT GCGGATTTGA AGGAAGAATT AACTCGTAAA CCGAAACCTT 180 

CACCAGTTCA AGCAGAACCC CTTGAAGCGG CAATTACAAG TTCTATGACG AATTTTGATA 240 

TTTTGAAACG CCTGAATAGA TTGGAAAAAG AAGTTTTTGG TAAACAAATT TTAGATAACT 300 

CAGATTTTTA AGTAGTTATT TGAGATGTGC AATTTTTGGA TAATCGCGTG AGGAGAATTG 360 

TTTCTCATGA GGAAAGTCCA TGCTAGCACA GGCTGTGATG CCTGTAGTGT TTGTGCTAGG 420 

CGAAACCATA AGCCTAGGGA CGAGAAATCG TTACGGCAGT TGAAATGGCT AAGTCCTTGG 480 

ATAGGCCAGA GTAGGCTTGA AAGTGCCACA GTGACGGAGT CTTTCTGGAA ACAGAGAGAG 540 

TGGAACGCGG TAAACCCCTC AAGCTAGCAA CCCAAATTTT GGTCGGGGCA TGGAGTACGC 600 

GGAAACGAAC GTAGTATTCT GACTGCTATC AGCTAGAGCT GTTAGTGGTA GACAGATGAT 660 

TATCGAAGGA AGTGGTCCTA GTCACTTCTG GAACAAAACA TGGCTTATAG AAAATTGCAT 720 

ATAGGTTGGG GCTGAGAAAT TTTCTCAACC TCATTTTTTA AAGTGGACAT ATAGAAAGGT 780 

CTTGCAAGAC TGTAACATGA AAAAAGAATT TAATTTAATT GCAACTGTGG CAGCAGGGCT 840 

TGAGGCTGTC GTTGGTCGTG AAGTGCGAGA GTTGGGCTAC GATTGTCAGG TTGAAAATGG 900 

ACGTGTTCGT TTTCAAGGAG ACGTGAGAGC TATTATCGAA ACCAACCTTT GGCTTCGGGC 960 

AGCAGATCGT ATCAAAATTA TCGTAGGAAC GTTCCCAGCT AAGACTTTTG AAGAGCTATT 1020 

TCAGGGAGTT TTCGCTTTGG ATTGGGAAAA TTATTTACCA CTTGGAGCTC GGTTCCCGAT 1080 

TTCAAAAGCT AAATGTGTTA AGTCCAAACT TCACAATGAG CCCAGTGTTC AGGCTATTTC 1140 

TAAGAAAGCT GTTGTCAAGA AATTGCAGAA ACACTATGCT CGCCCAGAAG GGGTTCCTCT 1200 

GATGGAGAAT GGCCCAGAGT TTAAGATTGA GGTCTCTATT CTCAAAGATG TGGCAACTGT 1260 

CATGATTGAT ACGACCGGGT CTAGCCTCTT TAAACGTGGT TATCGTACCG AAAAAGGTGG 1320 

CGCTCCTATC AAGGAAAATA TGGCAGCAGC CATTTTACAA CTTTCTAACT GGTATCCAGA 1380 

CAAGCCTTTG ATTGATCCGA CCTGTGGTTC GGGGACTTTC TGTATTGAGG CAGTTATGAT 1440 

TGCTAGAAAG ATGGCGCCAG GTCTTCGTCG CTCTTTTGCA TTTGAGGAAT GGAACTGGAT 1500 

CAGCGATCGC TTGATTCAAG AAGTGCGCAC AGAAGCGGCT AAAAAAGTAG ACCGTGAGCT 1560 

TGAGCTGGAT ATCATGGGCT GTGATATTGA TGCTCGCATG GTGGAAATTG CTAAGGCCAA 1620 

TGCTCAGGTA GCTGGTGTTG CAGGAGACAT TACTTTTAAG CAGATGCGCG TGCAGGATTT 1680 

ACGTTCCGAT AAAATCAATG GAGTAATCAT TTCCAATCCG CCTTATGGTG AACGTTTGTC 1740 

AGATGATGCA GGGGTGACCA AGCTCTATGC TGAGATGGGG CAAGTATTTG CACCGCTGAA 1800 
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AACTTGGAGC AAATTTATCC TGACTAGTGA TGAAGCTTTT GAAAGCAAGT ATGGTAGCCA 1860 

AGCAGATAAG AAGCGTAAGT TATACAACGG AACCTTGAAA GTGGATCTAT ATCAATATTT 1920 

TGGTCAGCGT GTCAAACGGC AAGAGGTAAA ATAGAAAGGG ATACTCATGA GTAAAAAAAG 1980 

ACGAAATCGT CATAAAAAAG AAGGTCAAGA ACCGCAATTT GATTTTGATG AAGCAAAAGA 2040 

GCTAACAGTT GGTCAAGCTA TTCGTAAAAA TGAAGAAGTG GAATCAGGAG TCTTGCCTGA 2100 

GGATTCCATT TTGGACAAGT ATGTTAAGCA ACACAGAGAT GAAATTGAGG CGGATAAGTT 2160 

TGCGACTCGT CAATACAAAA AAGAGGAGTT CGTTGAAACT CAGAGTCTGG ATGATTTAAT 2220 

TCAAGAGATG CGTGAGGCTG TAGAGAAGTC AGAAGCTTCT TCGGAGGAAG TTCCATCTTC 2280 

TGAAGACATC TTACTACCCT TGCCTCTGGA CGATGAGGAG CAAGGCTTGG ATCCTCTATT 2340 

GCTAGATGAT GAAAATCCAA CAGAAATGAC TGAAGAAGTG GAAGAGGAGC AAAACCTTTC 2400 

TCGTCTGGAT CAAGAGGACT CAGAAAAGAA AAGTAAAAAA GGCTTTATTT TGACCGTTTT 2460 

GGCGCTTGTA TCAGTAATTA TTTGTGTCAG TGCTTATTAT GTCTACCGTC AAGTGGCTCG 2520 

TTCGACTAAG GAAATTGAAA CTTCTCAATC AACTACAGCC AATCAATCGG ATGTGGATGA 2580 

TTTTAATACA CTTTATGACG CCTTTTACAC AGATAGCAAT AAAACGGCTT TGAAAAATAG 2640 

CCAGTTTGAT AAACTGAGTC AACTCAAGAC TTTACTTGAT AAGCTGGAAG GTAGTCGTGA 2700 

ACATACGCTT GCCAAATCTA AATATGATAG TCTAGCAACG CAAATCAAGG CTATTCAAGA 2760 

TGTCAATGCT CAATTTGAGA AACCAGCTAT TGTGGATGGT GTGTTGGATA CCAATGCCAA 2820 

AGCCAAATCG GATGCTAAAT TTACGGATAT TAAAACTGGA AATACGGAGC TTGATAAAGT 2880 

GCTAGATAAG GCTATCAGTC TTGGTAAGAG CCAGCAAACA AGTACTTCTA GCTCAAGTTC 2940 

AAGTCAAACT AGCAGCTCAA GTTCAAGTCA AGCAAGTTCA AATACGACTA GTGAGCCAAA 3000 

ACCAAGTAGT TCAAATGAGA CTAGAAGTAG TCGCAGTGAA GTCAATATGG GTCTCTCGAG 3060 

TGCAGGGGTT GCTGTTCAAA GAAGTGCCAG TCGTGTTGCC TATAATCAGT CTGCTATTGA 3120 

TGATAGTAAT AACTCTGCCT GGGATTTTGC GGATGGTGTC TTGGAACAAA TTCTAGCGAC 3180 

TTCACGTTCA CGTGGCTATA TCACTGGAGA CCAATATATC CTTGAACGTG TCAATATCGT 3240 

TAACGGCAAT GGTTATTACA ACCTCTACAA GCCAGATGGA ACCTATCTCT TTACCCTTAA 3300 

CTGTAAGACA GGCTACTTTG TCGGAAATGG CGCTGGTCAT GCGGATGACT TAG AT TACT A 3360 

AGCAGTCGG 3369 

(2) INFORMATION FOR SEQ ID NO: 43: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9713 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

AAGTTTACAA TTTAAATGAA TTAACAATTT TCCCAACTAA AAGCACTCCA GTTACCGCAA 60 

CGTTTGTACT GAATGTACTA AATCGCATTC CATCAACTTC ATCTGTTTCG TCAACTTGAA 120 

CAGATACTAA TTGAAGATTT AATACTTCTG CTGCCATAGC TAGCTCCTCC TATTTAAATT 180 

TTTGGGATTA AGTACTTTAT CCACCCTCAT ATACTCTCTC CACCAGTAAA ATGCAAGCAA 240 

TGATACAAAA TAGATTTAAC TATTTTATAT AGCGAAAACT TACAAATTTT TAAGAAATAA 300 

TTTTTGCATT CTTAAAGATA AAATAGGAAC TTTTAGTAAT AAATATTAAA ATAAATAAAA 360 

TAATAGATAC TATAAAATTT GGAAGTATTA ACCCCAAAAG ATTCATATCA TCTATTAAAA 420 

TATCCTCTAA AGAGTAGTAT ATTAAAGCCA TAATTTTAAT GTTAAGTAAA AATGCAATTA 480 

ATGAAGTAAC AAATGTCAAA AATATAGCCT CACCAACTTT AATCTTAACC ATCTGGTAAT 540 

TAGAAGTTCC TAAAATTTCA AATTGCTGAA TCTCAATCCT TTCTTGATGC GATGACAAAA 600 

ATGCAATTGA AATAATATTT GCAAGTACTA TCAAAATTGG TGCTCCTACA TAGACAATAA 660 

ATGCTACTTT TAGCTCTAAA TCACTGTCAT CTTGAAATTG AGATAGTATA TTCTGAGAAA 720 

TCATTTGAAA ACTAGAAATT AGTAATATAG CTCCTGTAAT TGCAGCACTG ATAGATTTTA 780 

TATAAGACTT ACAATATAGT AAATTCCACT TCGAAACAAT GAACATAAAA TTATTTCTAA 840 

ATATAATTAT AGAAAGTAGT TTGATAAAAC ATGACTGTAT AAAAGGAGAT AATTGATAAA 900 

TAATCACAAT ATCTAAGATT ACAATATTGA ATATTATCTG GGCCTTCGCT AAAATTGTGC 960 

TATCTTGGAA AATTTGTTGC AAAGAAAGCA ACCAGATAAC ACTAAAACCA GCCAATAGCA 1020 

GTATTCTTTT TACTATTGAA AGAACATGCC TTATTTTAGA ACTCTTCCTA TTTCTAATCT 1080 

TCTTGAACGT ATAAAAGCAA CCACTTAGAA AGGCTAAAAA TGAAATCAAC ACTACTGTAA 1140 

TGATACATCC AACAGCACTC GTTTGAAATT GGATATCAGG TAATATATTT TCCCCGAAAA 1200 

AGTATTGTAA AAAATAATAA TAATTTGACG TAACAAATAT AGAGCATAGA TATGCAATAA 1260 

AACTAATAAT CGAGGAAATG ATAAAAATCT GTCCCCCCAC AAGAAATGAT AGTTGAAGGC 1320 

GACTTGCTCC CAACACCTCC AGAAGTTCGT AATCATCTCT AAAAATTTCA ACCAACATAT 1380 

TTATTATGTT AGAGAGCACA AAGAATAATG TTACTCCTCC GAATACTATC GGAAACATAA 1440 

AAATTGGTTT AGGATCTGGA AGTCCGACAA ATACTTGCGA ATTATTCTCA ACATTAATTA 1500 

CCCCATTAAC AGCCAATCCC ATAACTAAAC TCGAAACAAA AATTACTGGT GAAACGCCTA 1560 
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ACCATTGTTT CTTATTATGT AAAAATTGAT AGTAAACTAA TCTGAGCATC TCTATTCCTC 1620 

CGTAGTTGAT TGTACCTCTA AGATTTTATA CAACTCTTCC CCGCTAGGTC TATGAAGTTC 1680 

TTTGAAAATT TTTCCATCTT TCAATATTAA TGCACGATCA GTTTTCGAGG CCAATTCTAT 1740 

ATCGTGCGTT ACCATAATTA CACACTTACC CGCCCCTACT AACTCTCTCA ATAATTCAAA 1800 

AATTACTTCA CGAGAAACGC TGTCTAAAGC CCCAGTTGGC TCATCAGCAA ATATTATATC I860 

ACTATCAGCA ATAACCGCTC TAGCTATAGC AACCTTCTGT TGTTCTCCAC CAGACAGAGT 1920 

TCCAACAAAA TCGTTTAAGC CAGCATTAAA CTTCATTCTT TTGAGTAAGT TTTCTACATT 1980 

TTTAATAGTT AATTTTTTTT GTGATAATCG CAAAGGAAGT GCTATATTTT CTATTACCGG 2040 

CAGGGAAGGT ATTAAATTGT ATGCTTGAAA TATAAAAGAT ACTTCGTTAC GTCTTATACT 2100 

TGACAATTTT GCATTTCTGA TTTTATAGGG GTTGATTCCA TTTAAAATTA CTTCCCCACT 2160 

TGTTGGTTCA AGCAAACTAG AAATACATTT TAATAAAGTT GACTTTCCAG AACCACTAAT 2220 

TCCTAGAATA CTTATAAATT CTCCTCTCGA AGCAGAAAGA GAAACATTTT TCAGCACTTG 2280 

CAACGTTTTA TTATTTCCTA GTAAAAATTG ATGATACAGC CCTTTCACTT TTAATATATA 2340 

ATCTTTATCC ATATTCTTGC CTCCAATCAC TTAATTTTGA AAAGTGTTCC ATTTTCCAAT 2400 

TTATATATAT CAGTGTATCT CTTGTCATTT AAGTCATAAT GATGTGAAAC TTCAATAAAT 2460 

GAAATACCTA AATTGAACAG AATATCATGT ATGGAATTTG AATTATCATT ATCTAAATTA 2520 

GCTGATATTT CGTCAAATAA GTACACTTTA TTATTTCTAA TCAGAGCTCT AGCTAAAGCT 2580 

ATTTTTTGTT TTTGACCTCC AGACAAATTA CTACCATTTT CACCACATTG ATAATTTAGT 2640 

ATATCTATCT TTTCTAATTC TTCATATAGA TTTACCTTTT TTAACACCTC AATTATCTGA 2700 

TCATCTGAAA AATATTCATT TTGAAATAAA GTTACGTTCT CACGAATAGT AGTGTCAAAA 2760 

ATATATGGTG TCTGATCAAC TGTTGGTATT GAATCTGAAC TCTTTTTCCC ATGTGATAAC 2820 

AAATTTACAT AACCTTTTTG TGGCTTTAAA GAACCATTAA TTAAATTTAA AATCGTTGTT 2880 

TTCCCACTAC CAGAAGTTCC TGTTAATAAT ACCCTAAATG ,GTGACTTAAA TGAGAAGTCA 2940 

ATACTTAATT TATTTTCTGG TGTAATAGAA TATACAACAT CTTTCATGTG TATCTCATCT 3000 

ATTGATGAAG TATACAGTCC GTTATTATCA TGTTCAGCGT CTATAAAATT CTTCTCTCCA 3060 

CTTAAGTATT TTAAAAACGG TTTCCTTAAA TCTTTGGTTG TATTTATCTT ATTTAATGAA 3120 

TAGGCAATTG ATTGTATCGG CCCTAAAACT TTATCGTTTG CTAAGAAAAT ACCTATCAGT 3180 

TCACTAAAAG AAAGGCTTTT ATGATAAATT ACAAAATAAC ATCCTACAAC CAAGGGAACT 3240 

AGAAAGCAAA AACCTGAAAT TAGTACTGCA ACCAATTTTG AAAGAACCTC TGATCGTTTC 3300 
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AAATTAAAAG TAGAATCTTC TAGTTTATCC AACTTTTTAT CCGACAAACT AATTATTTCT 3360 

TTAGTAACAG AATAAGATTT TAATGTCTTA AAACCATTAA AAATTTCTTT TATTATGTGA 3420 

GTATACTCTG CATTGCTGTT AGAGTACTCA TTAGCTGAAT TAGACAACAT CTTCTTCATA 3480 

AAGACAGGTA CTATAATCGG CAATGCTGAT AATACAATAA ATATTATTGA nACTAGGAAG 3540 

TTTAAATAAA GCATAAAACT TAGAGAGACG ATGAACAACA ATATTGAAGA AATTATTTCA 3600 

AAAATTTGTC TAAAATAGTT TTCTTCGATT AATCTCAAAT CATTTGACAA AACTGAAATA 3660 

ATAGATGAGT AATCTTTAAC CATTTCAGAA GAAAGATACT GTTCTCTAAA ATATCCTTGT 3720 

TTAATTTTTA CATTTATATC TTTAGTTATT GATGCTTCCG TTACTTCTAA ATAGTAATTT 3780 

GATATATAGA TTGCTGACCA ACCCAGAATA CTTATAGCAC CAAATCTTAG AACGTCAGAA 3840 

AATGAGGAAG TCTGATTTAA ACTACCTGCA TATACAATAA TTCCTGAGAG CAAGACACCA 3900 

TTAAACGAAG ATAGAAATAT TAAAATCCCC ATTAATATAA GTTTAGTCTT TTTTATAAAT 3 960 

TTTAAATAAT TCATAAGTTA TTCCTTCCCA CTTCTTCAAA GAAATAATTT AAAGTATCAA 4020 

TCATTAAGAG AACATCTGAT GGAGTAAAAC CTCCATGACC AGCTGCTTTG TTTAAATACA 4080 

ACAAACTTTT AACTCCAATA GAATTTAATT TCTTTGACCA CTCTATCACT TCGTTATTAT 4140 

TAATATATGG GTCTTTCTCA CCCAAAATAT TAACTATAAC AGTATTTGAG TCTCGTGCCT 4200 

TTTCAATATT TTGCATAGGC GAATATGACT TTATATAAGC CTTTACTTCA GGGTCTCTAA 4260 

TATCTCCCCA CTCTGCTATT TCGGTCTTAG AAAGAGGATC ATTTGGATTC TGAAGTGTAT 4320 

CATAAGGATT TATAAATGGC GAAAATAAGA GAATGCTTTG CAATAAATTT TTTTCCTCGT 4380 

TCAACACCGC ACCAGCAATT ATTCCACCTG CACTAGAAGT TATTAAACCT AATCGCTTAC 4440 

TGTCAATTAC ATCATTTTCC CTTAAATAAT TTACTCCCTC AATAAAATCT CTGATAGAAT 4500 

TCCATTTGTT TAACGCCTTT CCTGAGCGAT ACCATTCACC ACCCAAATAG CCTCCACCTC 4560 

TTACATGAAC TATAGCATAA ATAAAACCTG CATCTATTAT AGATAACATA ATTTCATCTA 4620 

AATCAGAATT ATCATTCTTA CCATAAGCCC CATAGACACT TAGAATACAT TTTTTTCTJ'C 4680 

TTGGGAGCTC ATCCGTATCT TCACTTTTCC AAAATAAAGA AATCGGTATG CTTACATCAT 4740 

AACTGTCTTT TTTAGTCCAA ATCACCTTAG AAAAATATTT AGTATTATTC GATTTTATGA 4800 

TGGGTCTTTC AAATTCAGTT TTTAATGTAT TTTCTATTAA ATCAAAACTA AGTATTTTTT 4860 

CGTAAAAAGT TCTCCTCTCT AAAAACAGAA GAACACGATC AGAAAATGAA TTTTCATAAA 4920 

GTGTTGTCTT TTCATCAAAT GTTATCTTAT TAACACTCAA CTCCCTCAAA CTATTATTTT 4980 

TAAATGTAGC AAGATAAAAG ACGGAATTCG CTGCGTTTGA ACAGTCTAAA AGGATATAAC 5040 

GTCCTATACA GTGAACTCTT CTAGCCCTAT CTTGATATGG TATAGTAATA GAAACTCTGT 5100 
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CTCCCGAAGA AGTTTCCCTT AGAATTAGTT GATCTTTCTT TTCTTCAGTT GAAGAGAGCC 5160 

CAAGAAAGTA CTGTGCTTTT TCTGTACTAA ATAGAGCGAT ATCTCTAGGT GTTGGGGCTA 5220 

CCGTTTCTGT GTAAGAGTGT CTAACAAAAC CCGTCCGGTC GAAACTGTAT AGAAAAATCC 5280 

TGCCTTTCTG AAAGTCTACT GACTTTACAA AACAATTATT GCTATCAATG TGGACTATTT 5340 

TTAATCGAAA AGAGCATTCG TTTTCTTCAA ACAGTTCCTC TTCTGTAAAG CTATCAAAAG 5400 

ATTTATAGAA TAACTTACTT GGCCTCCCGT ACTCTTTGGA GCGAGTATAC ATAACACCGA 5460 

ATTTACCCAA ATAGAACGAA CTTTCTACTG AAATATCTTC AATGATAAAT AACTCTTCCA 5520 

TAGTATATTT TTTTATTCCA ATTAAATTAG TCGTACGCAG TGAGGATACA ACCAAAACTA 5580 

TATAACTCTC ATCAGATGAA ATCCTAACAT CCTGTAAGAT ACTATCATCT GGCAAAGTAT 5640 
ATTTTTCCAC ATCAAAGACA ATTTTAAGTG AATTTGAATT GTCTAAACTG GAAGAACTAA - 5700 

CCTTAGGAAT CCAGTCATTA TCTTCGACAT ACCATTCCTT TATTACACCA GTATTGGGTA 5760 

TACTCCAATT ATCAAATTGG TACCAATATC GCCCTCTCCT AAATATCAAA GAATTCCATT 5820 

TTTTTAATTC CTGAAATGAT GAAGAGATAG ACCTCTTATA GTGTGTTTTT TCCTGTATTG 5880 

TATTTAAAAA TATTTCATTA CTCTGATTCA CAAGTATGAC CCCTTAATAA TGGTATCTAA 5940 

ATATTATATT TGAGGAAGAA TCGTCAATTT ATTATCCATT ATTGATACCA ATCCAATTGC 6000 

AACACCCGCA AATCCCGAAG CAATATCTGT TGTTATCTTT AAACCATTAT CTCCCGCAAT 6060 

AACAAATCCT TCTTCAATTA CACACAAATA TCTATAAAGT TGTTCAATTA ATTTCTTTTG 6120 

TCCTGAAAAG TTATCATCGA TATCACTATA TATATTATTA GCAACTTCAA GACCACAAAA 6180 

TCCGTTAAAT AAACCTGGTA ATACACAAAA AACTACATCA GTTGCCCTCT CTAAAGAAGT 6240 

TAAATATTTT AAGTATTTGC TTGACAAGAT TTCTTTATTT CTATTAATAA GTAAAAGCAG 6300 

GCCAGCACTT CCAGTTGCTA GATATGGTAG TAATCTATGA CCTTGGCTGT ACTGCAATGA 6360 

ATTATTACTA TCTACTTTAT AAGCAACTAA TTCTTTATCT ACAGCCAATT CTAGACCATT 6420 

TTTATAGATA CTTTCACCAG TTAATTTATA AGCTTCACCG AAGAGCCAAG CTACCCCTGC 6480 

GTGACCATAT AGTAATCCAC CAAAATTCTC ATAAGGATCG TTACTCTGAA CATCACTAGC 6540 

GCCAACTTTA CAAAAAGTTT CTGGATTTTC TATATAATTT AAAGTATATT CTCTAAGCCT 6600 

AATTAGTATT TCTTCTCCTA GTTTATTATC AATTCCCCCT TTACTAAGAA AATACAGTCC 6660 

AACCAGTAAA ATTCCAGCCT GCCCACTATA TAAATTTTTA TTTTGTGAAT TCTCAAATAT 6720 

CTCTATAAAA TGAGTTGTAA AAAGTTCAAC TGCCCGATCT ATCTCCCCAA ATTCATAAAT 6780 

GAGCCAGATT GTACCAATTT TACCATCAAA AAGACCAGAA AGGGACGATT TCTTAAAATT 6840 
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ATTTACTGCC TCATTAATAA CCTGTGTTCG AATCTCATAA TAGTCATCAA ACTTGAAATT 6900 

TTTTACTTTC TTAGCTAGTT GTTGATAACT CCAAAGGATA GCTAAATCTG AAAACGCAAT 6960 

TCCTTGATTA AAATTCAGAC CATAATAATG AACTGGGAAG AATCTTGATT GAAATTCTTT 7020 

ACGCCACTGT CCATAAGTTA GCGTAAACCC TCTCAATAAT TTTATAATAA AATCTTGTAT 7080 

ATCTTGCTCA CTCTCGATAG TTCTAATCTC ATGCATGGGT TTTAAAACTT TTTTCCTGGA 7140 

AATATTCTCA ATCTGTGGAC ATTTAGAATC TAGATATGAC AATAAACTTT CTACATAATC 7200 

TATATGTTCT CTTGTATAAC CCAAAGACTC AAATAGTTTT TTTCCTTCTA TCCTGGTTTG 7260 

ACTTACATAG TTGTATGTCA AATCCGATGT AGTTACTAGT GGCATGTATA AATAATGAGC 7320 

TATTTGTCTA ATACCATACC AATCTATCTC ACTGGGAAGT GTTTCTCGCC ATGCTCTAAA 7380 

ACCAGGGGCT GCAACTTTAT GTACAACTTT TTCATCATTT GAAAAGACAG CCTGTTCCCA 7440 

GTCTATTATA CTAATCTCAT CTTCATCCTT AACCAAGATA TTTCCTAAAT GTAAATCTTG 7500 

ATGATATACA TTTTCAGAAT GAAACTTATT CGTTAAATCG ATGAGTTTTT CTACTATCTT 7560 

TGAAACTCTC AATAGATAAT CTTTGGTCTT ATCAACAACT TCATATAAAG GAAAATTATT 7 620 

GGTAACCCAT CTATTTAGTG GAACGCCCTT CATATGTTCA ATTCCTAAGA AGGTGTGCTC 7680 

CCAGATCTTA CCGTGCCAGT ATATTTTAGG CGTCTCACTC CATTCATTTA GAATTTTTAG 7740 

TGCTTTGCAC TCCGAAGCTA ATTTCTCTGA AGAATAAGTA CCATCAAATC CTAGACCTGT 7800 

ATACGGTCTA GCCTCTTTTA AAATTATTTT TTTCCCATCT TCTTTTAGCC TAGCATTATA 7860 

TATCCCACCA CTGTTTGAAA ATCTAATTGC ATTATCTATA ATAAAGGGAA AGTCTCCCTG 7920 

TTTTTTATCT TTCTTGTCAA GCCATTTATT CAAAAAGTCA GGGGGCACTA TACCTTTTGG 7980 

AATTTTAAAT ACTGGTAAAC GTTCATCTTT AACAACTTCA TCGCCAACAA TTAATTCATC 8040 

AATAGCAACC TTCTTTTCAT CATCCCTTGA CGGCCTAAAC ACACCATACC TCAGATATAT 8100 

TGGTGCTTCA TCCCAACGTT TATCGCTTAA AATATATGGC CCATTATATT GCTTTAAGGC 8160 

ACTTTCTAAC CTTTGCAAAA CCGACTCTAA TTCATTTTGA TTTGGATAAC ATGTAATAAA 8220 

TTTACCAGAA AATCCTCGAC TAACCAATTT CCCGTTTCGC ATGATAAATT TGTCTTCTGT 8280 

ACTAAGATGT TTAAATGGAA TTCGCATTTC ATGGCAAATT TTTGCTACAT CTTGTAACAA 8340 

TTCATGTGAA CTGTTATACT CTGAACTAAT GTGTATTTTC CACCCTTGTC TTTCAACAAA 8400 

TTTTCCAATA GGGTATTGAT AAACCCACTC ATCATTATTC ATTACTTCGT GCCAATTAAA 8460 

AGGCAGACTT ACTTGGTACT TTATGCTAGT ATCTGTACTA TAATCATTAT TAGTGAAAAA 8520 

GAAAGGATGC TCCAAATTGA AATTATAATC CATAACAAAA TCTCCAAGAA ATTTTATCAA 8580 

ACTTAATATA TCTATAGCTA GACAGACTTA TTTAAATAAA AAGGGAGAAT CCTTTGGATT 8640 
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CTCCCCATAT. AAGCACTAAC ATTCCAACGT GCACATATTG GAACGACATC CATAACTCCA 8700 

GAGAATCTCT AAAGTTTACA ATTTAAATGA ATTAACAATT TTCCCAACTA AAAGCACTCC 8760 

AGTTACCGCA ACGATTTGTA CTGAATGTAC TAAATCGCAT TCCATCAACT TCATCTGTTT 8820 

CGTCAACTTG AACAGATACT AATTGAAGAT TTAATACTTC TTCTGCCATA GCTAGCTCCT 8880 

CCTATTTAAA TTTTTGGGAT TAAGTACTTT ATCCACCCTC ATTATACTCT CTCCACCAGT 8940 

AAAATGCAAG CAATTATACA ATGTTGTCAC ATAGAAAATA ATGTTTCCGT AACTTTTCAA 9000 

AGTAACTTCC ATCTCTCTCC CAAAACTGGA AGTTAGTTTT AGAAGTTACC TAAAAATCAG 9060 

GTCACCTATT TTAAAAAAGC AGCAAACTAT AAACTAGTAG GTTCCACACC AAATGTAGTC 9120 

CCATACTGCC CCATAAGTCA GATTTATAGC GCACCATACC TAAAAACATC CCAAGTGAAA 9180 

CATACAAACA CCAAGCTAGA ATGGTTCCTG TATGATGTGC TAAGGCAAAT AAAACACTTG 9240 

TCAAAGCAAC TCTGATATCT AATTTTCTGA CCAAATTCCA TAAAATTTCT CGATACAGAA 9300 

ATTCTTCAAC CATACTCGCA TTGATTAAGA ACAATAAAAA TGAAAACCAA GGAATTTGAT 9360 

GTTGAAGGCC AATTAAGTTT GCTTGATTCG TGCTTCCTTG AGCATGAATC AGACTAAAAC 9420 

ATAGACTTAT AATCAGTAGG CTAACAAATT CAACACCAAG CCATTTCATC CTAGATTTCA 9480 

TATTGACCTT ATGCGCTTGT TTGCGTTGGC CATACATCCA TAAAAAAGAA ATGAGTGACG 9540 

AACCATAGAG AATCTGTAGT ATAGTTmACT CACCGATACA AAGAAATTTC AATAAGTATA 9600 

GAGrTACCAA TAsGACATTT ACTTGTTGGA ATATATAAAC TGGAATTATT CTTTTCATAG 9660 

TTACCTCCGA AATAAATCTT CATAATCTAA ATCTAATACC TGCACAATCC TTT 9713 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8657 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 44: 

AAAGAAATTG TCAGAGAGTG GCTAGATGAA GTAGCAGAGC GGGCTAAGGA CTATCCAGAG 60 

TGGGTGGATG TTTTCGAGCG TTGCTACACC GATACCTTGG ACAATACGGT TGAAATCTTA 120 

GAAGATGGTT CAACTTTTGT CTTGACTGGG GATATTCCTG CCATGTGGCT TCGAGATTCG 180 

ACAGCCCAAC TCAGACCCTA CCTTCATGTA GCTAAAAGAG ATGCCCTCCT GCGTCAGACC 240 

ATTGCAGGTT TGGTCAAACG TCAGATGACC TTGGTACTCA AGGATCCCTA TGCTAACTCC 300 
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TTCAACATTG AGGAGAACTG GAAAGGGCAC CACGAGACTG ACCACACAGA CCTTAACGGC 360 

TGGATCTGGG AGCGCAAGTA TGAGGTGGAT TCGCTTTGCT ATCCTTTGCA GTTGGCTTAT 420 

CTCCTCTGGA AAGAGACTGG CGAGACTAGT CAGTTTGATG AGATTTTTGT CGCAGCGACT 480 

AAGGAAATTC TCCATCTGTG GACGGTGGAA CAAGACCACA AGAACTCTCC TTATCGTTTT 540 

GTCCGAGATA CGGACCGTAA GGAAGACACC TTGGTAAATG ATGGCTTTGG ACCTGACTTT 600 

GCAGTGACAG GTATGACTTG GTCAGCTTTT CGTCCGAGTG ATGACTGTTG CCAGTATAGT 660 

TACTTGATTC CGTCAAATAT GTTTGCTGTA GTAGTCTTGG GTTATGTGCA AGAAATCTTC 720 

GCAGCATTAA ACCTAGCTGA TAGCCAGAGT GTTATTGCTG ATGCCAAGCG TCTTCAGGAT 780 

GAAATCCAAG AAGGAATCAA AAACTACGCT TACACCACCA ACAGCAAGGG CGAAAAGATT 840 

TACGCTTTTG AAGTGGATGG CCTAGGAAAT GCCAGCATCA TGGATGATCC AAATGTACCA 900 

AGTCTACTAG CTGCGCCCTA TCTGGGCTAC TGTTCGGTCG ATGATGAAGT GTATCAAGCT 960 

ACTCGTCGTA CCATTTTGAG CTCTGAAAAT CCATACTTCT ACCAAGGAGA ATACGCAAGC 1020 

GGTCTCGGCA GTTCTCATAC CTTCTATCGC TATATCTGGC CAATCGCCCT TTCTATCCAA 1080 

GGCTTGACAA CAAGAGATAA GGCAGAGAAA AAATTCTTGC TGGATCAGCT GGTTGCCTGC 1140 

GATGGTGGTA CAGGTGTCAT GCACGAAAGC TTTCATGTAG ATGATCCGAC CCTCTACTCT 1200 

CGTGAATGGT TCTCCTGGGC TAACATGATG TTCTGTGAGT TGGTCTTGGA TTACTTGGAT 1260 

ATTCGCTAAG GGGCTCGCTT TAGCTCAACC GATTCTTATC AGAATCACAA GTTTACATTT 1320 

AAAACGTTAA AATTTAAATT TAGAATGAGG TTTTACTTCA TGGAAAATGT TGTTGTACAT 1380 

ATTATCTCAC ATAGTCACTG GGATCGTGAG TGGTACTTGC CTTTTGAAAG CCATCGTATG 1440 

CAGTTGGTGG AATTGTTTGA CAATCTCTTT GATCTCTTTG AAAATGACCC TGAGTTCAAG 1500 

AGTTTCCACT TGGATGGACA AACTATTGTC CTTGATGACT ACTTACAAAT TCGCCCTGAA 1560 

AATCGCGACA AGGTCCAACG CTACATTGAC GAGGGCAAAC TTAAAATTGG TCCCTTTTAC 1620 

ATCTTGCAGG ATGACTACTT GATCTCCAGT GAAGCCAATG TCCGCAATAC CTTGATTGGT 1680 

CAACAAGAAG CTGCCAAATG GGGTAAATCA ACCCAGATTG GCTACTTTCC AGATACCTTT 1740 

GGAAATATGG GACAAGCGCC TCAAATTCTT CAAAAATCAG GCATTCACGT GGCGGCCTTT 1800 

GGTCGTGGTG TGAAGCCGAT TGGATTTGAC AACCAAGTCC TTGAAGATGA GCAGTTTACG 1860 

TCTCAGTTTT CAGAAATGTA CTGGCAGGGT GTGGATGGTA GTCGTGTTTT AGGTATTCTC 1920 

TTTGCCAACT GGTACAGTAA CGGGAATGAA ATTCCAGTTG ACAAAGATGA GGCCTTGACC 1980 

TTCTGGAAAC AAAAATTGTC AGATGTGCGT GCCTACGCTT CGACCAACCA ATGGTTGATG 2040 

ATGAACGGCT GTGACCACCA GCCTGTACAG AAAAATCTGA GCGAAGCCAT TCGTGTGGCA 2100 
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AATGAACTCT TCCCGGATGT AATCTTTGTT CATAGTTCTT TTGATGAATA TGTTCAAGCT 2160 

GTAGAAGGTG CGCTTCCTGA ACACTTATCA ACTGTTACAG GCGAGTTGAC CAGTCAGGAA 2220 

ACAGATGGCT GGTACACACT TGCCAACACT TCTTCATCCC GCATTTACCT AAAACAAGCC 2280 

TTCCAAGAAA ATAGCAACCT CCTAGAGCAA GTGGTAGAAC CCTTGACTAT TATCACTGGT 2340 

GGACACAACC ACAAGGACCA GTTGACCTAT GCTTGGAAAA CACTTTTGCA GAATGCGCCA 2400 

CATGATAGTA TCTGTGGCTG TAGCGTGGAC GAAGTTCACC GCGAGATGGA AACGCGTTTT 2460 

GCCAAGGTCA ACCAAGTAGG AAACTTTGTT AAAAGTAACT TGCTCAACGA GTGGAAGGGT 2520 

AAAATTGCTA CGGATAAGGC TCAAAGTGAC TATCTCTTTA CTGTCATTAA CACAGGCTTG 2580 

CATGATAAGG TCGATACTGT CAGCACAGTG ATTGATGTGG CGACTTGTGA TTTCAAGGAA 2640 

TTGCACCCAA CAGAAGGCTA CAAAAAGATG GCTGCTCTTA TCTTGCCAAG TTACCGTGTG 2700 

GAGGACTTGG ATGGTCGTCC TGTAGAGGCT ACAATCGAAG ACCTCGGAGC TAATTTTGAG 2760 

TATAATTTAC CAAAAGACAA GTTCCGCCAA GCTCGTATTG CTCGTCAAGT GCGCGTGACC 2820 

ATTCCAGTTC ACCTAGCGCC GCTTTCTTGG ACAACCTTCC AATTGCTGGA AGGAAAACAA 2880 

GAACACCGTG AGGGTATTTA CCAAAACGGA GTGATTGATA CACCATTCGT AACGGTGAGT 2940 

GTGGATGACA ACATCACAGT CTATGACAAG ACAACTCACG AAGCCTATGA AGACTTTATC 3000 

CGCTTTGAAG ACCGTGGGGA CATCGGAAAC GAGTATATCT ATTTCCAACC AAAAGGAACA 3060 

GAGCCAATCT TTGCAGAGCT TAAGGGCCAC GAGGTCTTGG AAAACACAGC TTGCTATGCT 3120 

AAAATCTTGC TCAAACATGA ATTGACCGTG CCTGTCAGTG CGGATGAAAA GCTAGAAGAA 3180 

GAGCAACAAG GTATCATCGA GTTTATGAAG CGTGAGGCTG GACGGTCAGA AGAATTGACA 3240 

AACATTCCTC TGGAAACTGA GTTGACTGTC TTCGTTGACA ATCCACAAAT CCGCTTCAAG 3300 

ACTCGCTTTA CTAACACTGC CAAGGATCAC CGTATCCGTC TCTTGGTCAA GACTCATAAC 3360 

ACGCGTCCAA GCAATGATTC TGAAAGTATC TATGAGGTGG TGACACGACC AAACAAACCA 3420 

GCTGCTTCAT GGGAAAACCC TGAAAATCCT CAACACCAAC AAGCTTTTGT CAGTCTGTAT 3480 

GACGATGAAA AAGGGGTGAC TGTATCCAAC AAGGGATTGA ATGAATACGA AATCCTTGGG 3540 

GATAACACCA TTGCCGTGAC CATTTTGCGT GCATCAGGTG AGCTAGGTGA CTGGGGCTAC 3600 

TTCCCAACGC CAGAAGCACA ATGCTTGCGG GAGTTTGAAG TCGAGTTTGC ACTTGAATGC 3660 

CACCAAGCCC AAGAACGCTT CTCAGCCTAT CGTCGTGCCA AAGCCTTGCA GACACCGTTT 3720 

ACCAGCCTTC AGCTTGCTAG ACAGGAAGGA AGCGTGGTTG CGACTGGTAG CCTCTTGAGC 3780 

CATTCTGTTC TCAGCATACC GCAAGTTTGT CCAACAGCCT TTAAGGTAGC TGAAAATGAA 3840 
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GAAGGCTATG TGCTTCGTTA CTACAATATG TGTAGTGAAA ATGTACGTGT GCCAGAAAGT 3900 

CAACATCTCT TCCTTGACCT ACTTGAACGA CCATACCCAG TTCATTCAGG ACTATTGGCT 3960 

CCACAAGAGA TTCGTACAGA ATTCATCAAA AAAGAAGAAA TTTAATTTCA AAAAGTAAAC 4020 

ATCAAAAGAA AGGAGGGGCG AAAAAGTAAG AACTAACTGC TGATTCGCCC CTTTTATGGT 40 BO 

AAAAACAATG ACCATTGCAA CGATTGATAT CGGAGGGACT GGGATTAAGT TTGCCAGTCT 4140 

GACTCCTGAT GGGAAAATAC TGGATAAGAC AAGTATTTCA ACGCCTGAAA ACTTGGAGGA 4200 

TTTACTAGCG TGGCTAGATC AACGCTTGTC AGAACAGGAT TACAGTGGGA TTGCTATGAG 4260 

CGTTCCAGGT GCAGTCAATC AAGAGACAGG TGTGATTGAT GGCTTCAGTG CGGTGCCCTA 4320 

CATCCATGGC TTTTCTTGGT ATGAGGCGCT TAGCTCTTAT CAGCTACCTG TCCATTTAGA 4380 

AAATGATGCC AACTGCGTTG GACTCAGTGA ACTACTAGCT CATCCAGAGC TTGAAAATGC 4440 

AGCCTGTGTC GTGATTGGGA CAGGGATTGG CGGAGCCATG ATTATCAATG GTAGACTTCA 4500 

TCGAGGTCGC CACGGTCTGG GTGGAGAATT TGGCTACATG ACAACCCTTG CCCCTGCTGA 4560 

AAAACTTAAT AACTGGTCGC AACTAGCATC AACTGGGAAT ATGGTACGAT ACGTGATTGA 4620 

AAAATCTGGT CATACTGATT GGGACGGTCG CAAGATTTAC CAAGAGGCCG CAGCTGGTAA 4680 

TATCCTTTGT CAAGAAGCCA TTGAGCGCAT GAACCGCAAT CTGGCGCAAG GCTTGCTCAA 4740 

TATCCAGTAT CTGATCGATC CAGGTGTCAT CAGTCTGGGT GGCTCTATCA GTCAAAATCC 4800 

AGATTTTATC CAAGGTGTCA AGAAGGCTGT TGAAGACTTT GTCGATGCCT ACGAAGAATA 4860 

CACGGTCGCA CCAGTTATCC AGGCCTGCAC CTATCACGCA GATGCCAATC TCTACGGTGC 4920 

TCTTGTCAAC TGGCTACAGG AGGAAAAGCA ATGGTAAGAT TTACAGGACT TAGTCTCAAA 4980 

CAAACGCAAG CTATTGAGGT TTTAAAAGGT CACATTTCTC TACCAGATGT GGAAGTGGCT 5040 

GTCACTCAGT CTGACCAAGC ATCTATCTCT ATCGAGGGTG AGGAAGGTCA CTATCAATTG 5100 

ACCTACCGCA AACCTCACCA ACTTTATCGT GCCTTGTCCT TGTTGGTAAC AGTTCTAGCA 5160 

GAAGCTGATA AAGTAGAGAT TGAGGAACAA GCAGCTTACG AAGATTTGGC TTACATGGTT 5220 

GACTGTTCTC GAAATGCGGT GCTGAATGTG GCTTCTGCCA AGCAGATGAT TGAGATATTG 5280 

GCTCTCATGG GCTACTCAAC CTTTGAGCTT TACATGGAAG ACACTTACCA GATTGAAGGG 5340 

CAGCCTTACT TTGGCTATTT CCGTGGAGCT TATTCAGCAG AGGAGTTGCA GGAAATCGAA 5400 

GCCTATGCCC AACAGTTTGA CGTGACCTTT GTACCATGCA TCCAGACCTT GGCCCACTTG 5460 

TCGGCCTTTG TCAAATGGGG TGTCAAGGAA GTGCAGGAGC TCCGTGATGT AGAGGACATT 5520 

CTTCTCATTG GCGAAGAAAA GGTTTATGAC TTGATTGATG GCATGTTTGC CACGTTGTCT 5580 

AAACTGAAGA CTCGCAAGGT CAATATCGGG ATGGACGAAG CCCACTTGGT TGGTTTGGGA 5640 
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CGCTACCTGA TTCTGAACGG TGTTGTGGAT CGTAGTCTCC TCATGTGCCA ACACTTGGAG 5700 

CGCGTGCTGG ATATTGCTGA CAAATATGGT TTCCACTGCC AGATGTGGAG TGATATGTTC 5760 

TTCAAACTCA TGTCAGCGGA TGGCCAGTAC GACCGTGATG TGGAAATTCC AGAGGAAACT 5820 

CGTGTCTACC TAGACCGTCT CAAAGACCGT GTGACTCTGG TTTACTGGGA TTATTATCAG 5880 

GATAGCGAGG AAAAATACAA CCGTAATTTC CGCAATCATC ACAAGATTAG CCATGACCTT 5940 

GCATTTGCAG GGGGAGCTTG GAAGTGGATT GGCTTTACAC CTCACAACCA TTTTAGCCGT 6000 

CTAGTGGCTA TCGAGGCTAA TAAAGCCTGC CGTGCCAATC AGATTAAAGA AGTCATCGTA 6060 

ACGGGTTGGG GAGACAATGG TGGTGAAACT GCCCAGTTCT CTATCCTACC AAGCTTGCAA 6120 

ATCTGGGCAG AACTCAGCTA TCGCAATGAC CTAGATGGTT TGTCTGCGCA CTTCAAGACC 6180 

AATACTGGTC TAACGGTTGA GGATTTTATG CAGATTGACC TTGCCAACCT CTTACCAGAC 6240 

CTACCAGGCA ATCTCAGCGG TATCAATCCC AACCGCTATG TTTTTTATCA GGATATTCTT 6300 

TGTCCGATTC TTGATCAACA CATGACACCT GAACAGGACA AACCGCACTT CGCTCAGGCT 6360 

GCTGAGACGC TTGCTAACAT TAAAGAAAAA GCTGGAAACT ATGCCTATCT CTTTGAAACT 6420 

CAGGCCCAGT TGAATGCTAT TTTAAGTAGC AAAGTAGATG TGGGACGACG CATTCGTCAG 6480 

GCCTACCAAG CGGATGATAA AGAAAGTTTA CAACAAATCG CCAGACAAGA ATTACCAGAA 6540 

CTTAGAAGCC AAATTGAAGA CTTCCATGCC CTCTTTAGCC ACCAATGGCT GAAAGAAAAC 6600 

AAGGTCTTTG GTTTGGATAC AGTTGACATC CGTATGGGCG GACTCTTGCA ACGCATCAAA 6660 

CGAGCAGAAA GCCGTATCGA GGTTTATCTG GCTGGTCAGC TTGACCGCAT CGACGAGCTG 6720 

GAAGTTGAAA TCCTACCATT TACTGACTTC TACGCAGACA AGGATTTCGC AGCAACTACA 6780 

GCCAACCAGT GGCATACCAT TGCGACAGCG TCGACGATTT ATACGACTTA ATATTCTTCG 6840 

AAAATCTCTT CAAACCACGT CAGCTTCCAT CTGCAACCTC AAAACAGTGT TTTGAGCAAC 6900 

CTGCAGCTAG CTTCCTAGTT TGCTCTTTGA TTTTCATTGA GTAT AAAAAC AAGAACACCT 6960 

TGCTTGGCGC AGGGTGTTTC GCGTGAAACA GAAGAATTAT CTGGTTTCAA ATGCTACAGT 7020 

TAGACAAACT TATGATAAAA TAGCAGAAAG TGAATGTTTC CTAAGAGCAA TTGGAGGTAT 7080 

TATGCTACAC TTAAAATTAG TAAAACAAGA AATAGAAGCT GAAAAGCCAG CATCTGTAGA 7140 

AGCTTGGATC ATTTCCGTCA AATTTAAAAA AGGTTGCTAC CGACATATAT AGATTCCAAA 7200 

AACAAAAACG TTAGCGGAAC TAGCAGATGT GATTTTATGG AGTTTTGATT TTGCAAATGA 7260 

TCATGCTCAC GCATTTTTCA TGGATAATGT TGAGTGGAGT CATGCAGATT CTTACTTTCG 7320 

TAGCTTTGTT AGTGACGATG TTGAAGAACG TTACACAGAA AATGTCTATC TGGATAGCCT 7380 
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AAGTGTCAAA CAAAAATTTA AGTTTATTTT CGACTTCGGT GATGAATGGC GTTTTGAATG 7440 

CCAAGTGCTG AGAGAAATCG AGACAGAGGA CGAAGAAGCT TATCTCGTAC GTTCGGTTGG 7500 

AACGTCGCCA GAACAATATC CAGATTATGA TGGTTTTGAC TATGAAGAAT GGTAAAATTG 7560 

AAATCAGTCT GTGTAGGCTT AGTATTTCAA TAGACTTCCT GCAAAACTAG AATCCTAGTT 7620 

CATGATTGAT AATACCAGCA ATCAAATTCA TTCGTAATCC GAAGCGTTTA CGATGATTTC 7680 

GATAGGTTGT TGAAAACATT TTAAACGTTT TTACTTTGGC AAAGATGTTC TCAACCTTGC 7740 

TTCTCTCCTT AGATAGCGCA TGGTTATAGG CTTTATCTTC AGCTGTTAGT GGCTTGAGTT 7800 

TGCTGGATTT ACGTGAAGTT TGTGCTTGAG GACATATCTT CATGAGCCCT TGATAACCAC 7860 

TGTCAGCCAA GATTTTACCA GCTTGTCCGA TATTTCTGCA ACTCATTTTG AACAACTTCA 7920 

TATCATGACA ATAGTTCACA GTGATATCCA AAGAAACAAT TCTCCCTTGA CTTGTGACAA 7980 

TCGCTTGAGC CTTCATAGCG TGAAATTTCT TTTTACCAGA ATCATTCGCT AATTCTTTTT 8040 

TTAGGGCGAT TGATTTTTAC TTCCGTCGCA TCAATCATTA CCGTGTCCTC AGAACTAAGA 8100 

GGAGTTCTTG AAATCGTAAC ACCACTTTGA ACAAGAGTTA CTTCAACCCA TTGGCTCCGA 8160 

CGGATTAAGT TGCTTTCGTG AATACCAAAA TCAGCCGCAA TTTCTTCATA AGTGCGGTAT 8220 

TCTAGGCTTA ATTTAGGTTT TCGTCCACCT TTTGCGTGTT TAAGTTGATA AGCTGTTTTT 8280 

AATACAGCTA ACATCTCTTT AAAAGTCGTG CGCTGAACAC CAACAAGACG CTTAAATCGT 8340 

GTATCAGTTA ATTGTTTACT TGCTTCATAA TTTCGCAGGG AGTCTATTGA CTCTTTGGTA 8400 

GGTGTCAATG TTTTTTTCAT CTATCCCGAG AATTATTTTC CCGCCATTTG TATTTGCAAA 84 60 

TGCTGAGTAG GTTTCCCAGA AAGACTCTGG AAGATTGTTT TTAGCTTTTT TGTATTCTAA 8520 

ATCAACCCCT TCAAATTTTA AGTCCATATT TTTCCTTTAC ATCTGTTTTT TGTGGTTCTG 8580 

GTATTTGTTC AAGTTGAGTG ATAATATAGC GAATTGAATT TCGAGAGTTT TTACTCAGTT 8640 

AATTTCTTTT TTAACCC 8657 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11384 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TCTATTTTGG GTATAGACTT ACCTATAAAG AAAAATATCT ATACACTGCC TTACTAGCTA 
TACTGAACGA GTCAACAAAA ACGATATATA TTGATGATAT AAATACAGCA AGATTTTTTA 



60 
120 
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ACTTCTTTGG CAATGATATT CCTAATTCGT CTTTAAAAAA AATTGACTAT ATCGCACCTT 180 

CAGAAATTGT TTCATTTAGT ACGTACGTTC GACAACGTTC TAAAGTAATT CCTAAAATTT 240 

TGGAACATAT ATTAAAATCA AGTTTTTTAT TAGAGAATAT AGATGTTTCT GGTTACACTG 300 

TAAATATTTT AGAAGATCAA TTAACAAAAC ATAGAACAAT CAAAATTAGT AAAAACTAAC 360 

TGGTTGATCT CATGTATAAA TACCTAACAA AACCACGCGC CTTGCCTGCT GATGGAAAGA 420 

AAGGTACAAA TACATGAATA TCAAAGAAAA AATCAAAAAG AATGGCCAAA GAGTTTATTA 480 

TGCTAGTGTT TATCTAGGCG TTGACCAACT AACGGGCAAA AAAGCCCGTA CAACTGTTAC 540 

AGCAACCACT AAAAAGGGCG TTAAAGTAAA AGCGCGTGAT GCGATCAATA CTTTTGCTGC 600 

TAATGGCTAT ACAGTTAAAG ACAAGCCGAC AATTACAACA TATAATGAGC TTGTAAAAGT 660 

TTGGTGGGAT AGTTACAAGA ATACAGTTAA GCCAAATACT CGCCAATCCA TGGAGGGATT 720 

GGTTAGAGTG CATTTATTGC CTGTATTTGG CGATTACAAG CTATCTAAAC TTACTACGCC 780 

TATTCTTCAA CAGCAAGTAA ACAAATGGGC TGACAAGGCA AATAAAGGCG AAAAAGGGGC 840 

ATTTGCTAAC TACTCTTTGC TCCATAACAT GAATAAGCGT ATTTTGAAAT ATGGCGTAGC 900 

TATCCAGGTA ATACAATACA ACCCAGCTAA TGATGTCATC GTTCCACGCA AACAGCAAAA 960 

AGAAAAGGCT GCTGTCAAAT ACTTAGACAA CAAAGAATTA AAACAGTTTC TTGATTATTT 1020 

AGATGCTCTG GATCAATCAA ATTATGAGAA CTTATTTGAT GTTGTTCTGT ATAAGACTTT 1080 

ATTGGCCACT GGTTGCCGTA TTAGTGAGGC TCTGGCTCTT GAATGGTCTG ATATTGACCT 1140 

AGAAAGCGGT GTTATCAGCA TCAATAAGAC ACTAAACCGC TATCAGGAAA TAAACTCACC 1200 

TAAATCAAGC GCTGGTTATC GTGATATACC AATAGACAAA GCCACATTAC TTTTACTGAA 1260 

ACAATACAAA AACCGTCAAC AAATTCAGTC TTGGAAATTA GGCCGATCTG AAACAGTTGT 1320 

ATTCTCTGTA TTTACGGAGA AATATGCTTA TGCTTGTAAC TTACGCAAAC GCCTAAATAA 1380 

GCATTTTGAT GCTGCTGGAG TAACTAACGT ATCATTTCAT GGTTTCCGCC ATACACATAC 1440 

TACTATGATG CTCTATGCTC AGGTTAGCCC GAAAGATGTT CAGTATAGAT TAGGCCACTC 1500 

TAATTTAATG ATCACTGAAA ATACTTACTG GCATACTAAC CAAGAGAATG CAAAAAAAGC 1560 

CGTCTCAAAT TATGAAACAG CTATCAACAA TTTATAAAAA ATAAGGGTGA CCCATTTCCG 1620 

GGCTACCCTC TTACTATACC AAAAATTAGT AGGGGTAGTA AAAAGGGTAT TAAATTATAA 1680 

AAAGCACTAA GGGAAAGCGC CCCAAAGTGC TTATTTCAAA GGCTTTATAG CCTATAATCA 1740 

CATAAAGAGA TTATTTTTTA AGGTTGTAGA ATGATTTCAA TCCACGATAT TCAGCTACTT 1800 

CACCAAGTTG GTCTTCGATA CGAAGCAATT GGTTGTATTT AGCGATGCGG TCTGTACGTG 1860 
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AAAGTGAACC AGTCTTGATT TGTCCTGCGT TAGTTGCAAC TGCAATATCA GCGATTGTTG 1920 

AATCTTCAGT TTCACCTGAA CGGTGTGATA CAACAGCAGT GTAACCAGCT TCTTTAGCCA 1980 

TTTCGATAGC TTCAAAAGTT TCAGTAAGAG TACCGATTTG GTTAACTTTG ATAAGGATTG 2040 

AGTTAGCAGC ACCTTCTTGG ATACCACGTG CAAGGTAGTC AGTGTTTGTT ACGAAGAAGT 2100 

CGTCACCAAC AAGTTGTACT TTCTTACCAA GACGTTCAGT AAGAGCTTTC CAACCATCCC 2160 

AGTCGTTTTC ATCCATACCA TCTTCAATAG TGATGATTGG GTATTTGTTA ACCAATTCTT 2220 

CAAGGTAGTC GATTTGTTCT GCAGATGTAC GAACAGCAGC ACCTTCACCT TCAAATTTAG 2280 

TGTAGTCGTA AACTTTACGT TCTTTATCGT AGAATTCTGA TGAAGCACAG TCAAATCCGA 2340 

TAAATACGTC TTTACCTGGT ACATATCCAG CAGCTTCAAT CGCAGCAAGG ATAGTTTCAA 2400 

CACCATCTTC AGTTCCTTCG AAACGAGGAG CGAATCCACC TTCGTCACCT ACGGCAGTTT 2460 

CCAAACCACG TGATTTAAGG ATTTTCTTAA GAGCGTGGAA GATTTCAGCA CCGTAACGAA 2520 

GGGCTTCTTT AAATGTTGGC GCACCAACTG GCAAGATCAT GAACTCTTGG AAAGCGATTG 2580 

GAGCGTCAGA GTGAGAACCA CCGTTGATGA TGTTCATCAT TGGAGTTGGA AGAACTTTAG 2640 

TGTTGAATCC ACCAAGATAG CTGTAAAGTG GGATTTCAAG GTAGTCAGCA GCAGCACGAG 2700 

CTACAGCGAT AGACACACCG AGGATTGCAT TCGCACCCAA TTTACCTTTG TTAGGAGTAC 2760 

CGTCAAGTGC GATCATAGCA CGGTCAATAG CTTGTTGATC ACGTACATCG TAGCCAATGA 2820 

TAGCTTCAGC AATGATGTTG TTTACGTTGT CAACAGCTTT TTGTGTACCA AGACCACCGT 2880 

AACGAGATTT GTCACCGTCG CGAAGTTCAA CTGCTTCGTG TTCACCAGTA GAAGCTCCTG 2940 

ATGGAACCAT ACCACGTCCG AAAGCACCTG ATTCAGTGTA AACTTCTACT TCAAGTGTTG 3000 

GGTTACCGCG TGAGTCTAGG ACTTCGCGAG CGTAAACATC AGTAATAATT GACATTTTTT 3060 

ACTCTCCTTA TGAGTTAAAT TTTTTACACC TCTATAATAC CTTAAAACCC. CTCCTTTTTC 3120 

AAGAAAAAAC GTTATCTTTG TGCAACTTTT CCTTAACTTT ATAAAGTAAT CGCTTTCTTT 3180 

TGTCTGTTTT ATTCTAACTT TTATGATATA CTGTTTTCAT GACAGATTTA TCAAAACAAT 3240 

TACTTGAAAA AGCTCATGGT GGGTTAAAAA TAAATCCGGA TGAGCAAAGA CGCTATCTTG 3300 

GTACTTTTGA GGAAAGAGTT CTTGGATATG TAGATATTGA CACAGCAAAT AGCCCTCAGT 3360 

TAGAAAAAGG CTTTTTATTT ATTTTAGAAA ACCTTCAGGA AAAAGCAGAG CCACTATTTG 3420 

TGAAGATTTC ACCAACTATC GAATTTGATA AGCAAGTTTT CTACTTAAAA GAAGCAAAAG 3480 

AAACTGATAG TCAAGCCACC ATAGTATCTG AAGAGCATAT TACTTCTCCT TTTGGCCTGG 3540 

TTATTCATAG CAATGCACCA GTTCAAGTAG AAGAAAAAGA CCTTCGACTT GCTTTTCCAA 3600 

AACTTTGGGA AGTTAAAAAG GAAGAACCAG CCAAAACATC CTTATGGAAG AAATGGTTTA 3660 
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GCTAAATCTT GCACATATTT AATAAGTGCC CAATATTGGC AGCCGTGCGC TCCAGATAGA 3720 

AACTGGCATT TTTCAAACTA TCTTCTAAAG GTTCACTTTT CTCCAAAATA GAAAAGACAG 3780 

CTTGGATATT TTCAAATGGT AGGGGAGGTA AATCTTCAGC AAGACTACCG CAAATAGCAA 3840 

TAACAGGAAC TCCAACAGGG GTTCTTTTTG CAACACCTAT AGGCGCTTTC CCAGCAAAGC 3900 

TTTGACTATC AAGTCTTCCT TCTCCAACAA CAACCAAGTC AGCATCTGAA ACTTTCTTAT 3960 

CAAAGTTGAT TAAGTCCAAG CAGGTATCAA TTCCAGACAC GATACTTGCC TGAGCAAAGG 4020 

CACACAAACC ACCAGCAAGG CCTCCACCTG CTCCTGCTCC TTTAATTTCT AATGTTGCAG 4080 

GTGAGAATTT TTCATAAAAA TCTTGGATCG CCTGATCTAC GACTGCAAAC ATAGTCGGAT 4140 

GTAGACCTTT TTGATTGCCA AAAGTGTAAG TCGCACCTTG ATGACCACAT AAGGGACTCA 4200 

CGACATCTGC TAAAATATGA ATTTGAACAC CTTCAGGAAT TTTATAGCAA TTTTCTGTTG 4260 

AAACAGAAGC TAAGTTTAAT AAGGATTGAC CGGAAGCAGG CAAGACATTT CCATCCCTAT 4320 

CATAAAATTG ATAACCTAAA CCAGCAGCAA TCCCCAGTCC TCCATCATTA CTGGCCGTGC 4380 

CACCAACACC GATATAAATA TCTTTAATCC CTTTAGAGAT GAGATGAAGA ATCAACTCTC 4440 

CAATACCACA AGTTTGGATT TGAAGTGGAT TTCGTTTCTC TAGCGGAATT TTTCCAAGAC 4500 

CAACCAAGTC AGCTACTTCA AATAGTGCCA GTTCCCCTTT TTGAAAATAG CGCATGGCTT 4560 

CTTTTTGTCC AAAAGGGTCT GTCACTTGGA TCCATTTTTC TTTTAGGTCA AGAGAATGTC 4620 

GGATAGCATC TACAGTACCT TCTCCCCCAT CACCAACAGG GCAGAGGAGA CATTCTACAT 4680 

CTGCTATCGA TTGTTGGAAG CCTCTTTTTA TTGCTTCAGC TACCTGTTGA GCTGTCAAGC 4740 

TTTCCTTAAA CGAATCCGGT GCAATTACAA TCTTCATATT TTCCCTCATT CTAAACAGTC 4800 

AATCAAAGGG AGAACTTCTA AAAAATCCCT CTTGTCAACA TGATGTGGTA TTTCTTTTTT 4860 

GAGCACTTCT TTGGCACAAA AGGCGATTCC TAACTTCGCC GACTTCAACA TTAATAGATT 4 920 

ATTAACCCCA TCACCGATTG CCACCGTTCT TTCTTTAGAA AGTTTTAGTT TCTTTCTCCA 4 980 

TTTTTCCAGA GTCTCTTTTT TGACCTGGGG ACTTATAATT TGTCCAACTA ATTTTGCTGT 5040 
TAAAAGACCT TCTTTGACTT CAAGCTAGTT GGCAGTGAAA TAGGCAATAC CAAGGGATTT ' 5100 

TGCTAATCTC TCCAACTATT GGTGTAAATC CACCAGACAC CAGACCAACT AGGATGCCAT 5160 

TCTTTTGGAG AATAGAGATG AACTCTGGGA CATTTAGCGA TAGATGAATT GAGTTGAAGA 5220 

CGTTATCAAA GACCAAAATA GGAAGACCTT CCAACAAGGA CACTCTTTTT CTTAAACTGC 5280 

TTTCAAAGAC CAACTCTCCT CGCATTGCTC GACTTGTAAT CTGCGAAATT TCCGCCTCAT 5340 

GACCTGCCTC TCTCCCTAAA AGATCAATCA CTTCTTCTAG GATTAAGGTT CCATCTACAT 5400 
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CCAAAACACA CAAGCCTTTT ACTTGAGACA TCAGTTCTCC TCTCTAAACA GCCTAAAAAT 5460 

CGTATGAAGT CATCATACGA TTTTATCTAT TAATTAACTA AACTATGGTA CAAGTCAAGG 5520 

TATGACTTGC AGGCTGTATC CCATGAGAAG TCACTCTCCA TAGCTTGTTT TTGTAGGTTT 5580 

CTCCAAATGT CTGGATGGTT TCTATACAAG TCCAATGCTG TTTGGAAAGT CCAATTTAAC 5640 

CAATAAGGAG ATAGATTGTC AAAGCTAAAG CCAGTACCGC TTCCTTCGAT TGGATTGAAA 5700 

GCGCGAACTG TATCTCGCAA GCCTCCAACT TCATGGACCA ATGGCAAGGT TCCATAACGC 5760 

ATAGCCATCA TTTGAGACAA GCCACACGGT TCAAAACGAC TTGGCATGAG GAAGAGGTCA 5820 

CAAGCAGCGT AGATTTCCTG AGCAAGTTTG ACATCAAAAG TGATATTTGT TGATAGCTTG 5880 

TCTGGGTAAA TCTGAGCAAA CCATGAGAAA GCTCCTTCAA AGGCTGGATC GCCAGTTCCC 5940 

AAAAGAACAA TCTGAACATC TTCTTGCAAG ATATGGTGAA GACTTTCGAC CACCACATCA 6000 

AAACCTTTTT GACGTGTCAA ACGAGAAACA ATTCCCACCA GTGGAACGTC TGCTCTAACA 6060 

GGCAAGCCAA CTCTTTCTTG CAATTTTGCC TTATTTTTGG CTTTCCCAG A CAAATCTTCC 6120 

TGATTGAAAT GATAGTCTAA AAGAGCATCC GTCTGAGGAT TATAAAGATC AGCATCAATC 6180 

CCATTCACGA TACCAGATAC TTTACCAGAC TCCATTTTAA GAATCTGATC CAAATTACAT 6240 

CCAAACTGAC TAGTCATAAT TTCATGAGCA TAGCTAGGTG AAACGGTTGA AACACGGTTC 6300 

GCATAGAGAA TACCTGCCTT CATCCAGTTC AGACAGTTGT TCCATCGAAG GGTGCCATCA 6360 

GCGTAACGTT CAAAGCCAAC TCCAAACAAA TCACCCAACA TTCCTTCTGA AAATTGTCCT 6420 

TGGAATTCTA AATTATGAAT GGTTAAAACT GTTTCAATGT CCTCATAGGC TTGAATCCAA 6480 

CGGTATTTTT CCTTCAACAA GAAAGGAATC ATAGCTGTAT GGTAGTCATG AACATGGAGA 6540 

AGATCAGGAA TAAAGTCAAT CCTTTCCATA GCCTCAATGG CAGCCAGTTG GAAAAAGGCA 6600 

AAGCGTTCTC CGT CATC AAA ATCACCGTAA ACATGACCAC GGAAGAAATA ATATTGATTG 6660 

TCAATAAAGT AGAAGGTTAC ACCATTTAAT ACTGTTTTCT TAATTCCACA ATACTGTCTG 6720 

CGCCAACCAA CGCTCACCTC AAAATGAAGC ACATCTTCAA TCTGATTTCC AAATTTAGCC 6780 

TCTACCATAT CATAGTAGGG TAAAATCACT GCAACTTCGT GCCCAGCTTT TACCAGTGAT G840 

TTTGGAAGAG CGCCAATGAC GTCTCCCAAA CCACCTGTTT TTGAAAAGGG TGCACCCTCT 6900 

GCTGCTACAA ATAAAATTTT CATGAATGAA TATCCTCTGT TACTTTAGCA CCTTTCTTAA 6960 

CCACAACTGG ATGTTCTGCA GTTCCTCGAA TCACAACACC ATGCTCAACT TCAACCCCTT 7020 

TGTCCAAGAT AGCATATTCG ACCTGAGCCC CTTCTCCAAT AACAACACGA GGGAAGAGCA 7080 

GGCTATCTTT AACCAAGCTA TCCTTATGGA CATGAATATT ACGTGATAGA ACAGAATTAG 7140 

CTACTTGACC TTCAATAATA CTACCAGAGG CAAACTGAGA AGTGCTTACC TTAGATGTAT 7200 
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TAGCATAGTA AGTTGGCTCT TCGTTTTTGA CCTTTGTATA 
AAAGAGAATA GAATTTTTGT GATTCAAGCA TATCGATATT 
CAGAGTGAAT ATTGGCTAGA TAGCCCGTGT ACTCGTAGGC 
CCAAATCCCG TAAAACATAG CGCAATTTCT CTGGATGTTC 
AGTGTTCAAT CAACCAAGGT GTATCAACGA CAAAGATATC 
CAGCTGTTGA CTTGCTATCA AAGAGTTTAT GAGAAAGAAC 
CCAAGATTGC ATTTACTTCT GAAATATCTT TCTTAGCTAG 
TAGGCTCTTT TGTTGTACTA TGTAGGTGGA AAACTTGGTT 
CATCGCAGTT GAGGGCAACC GTTTGGTTTG AGCCAGAACG 
GTTGGTAGTA TTCTTTTCCA ACTGTACTAC TTTCTACACG 
AGTAATGGCT AAGAAGGGTT GATAAGCCCC ACTCGCGTCC 
CTGAGCTGAT ATTATCCTGC TGGAAAATAC CAAAGACACT 
GGCTTGAAAG TGGGAAGTCA ATCAAACGAT ATTTCCCACC 
GACGGTGGTC CGTCAATGTC GACATATTGT GAAAACCAAC 
AATATTTATC AATCTTCATC TGTTGCTACC CCCACTACTT 
ACTTCATCTG TTCCATCAAT TTCGACACCG TCAGAAATAA 
GCACGTTTAA TCTTAGCTCC TTGACCAATG ATAGCTCCAC 
ACTTCCGCTC CTTCGCGAAC TTGCGCGCCT GTTGAAAGGA 
TCAACGAAAC ATCCGTCTAC AACTAATGAG TCTTCCACAT 
TTTGGTGGTG AAATCAAGTT TCTTGAGTAA ATCTTCCATT 
GCATTTTCTG GAGAAATATA CTCCATGTTC GCTTCCCAAA 
TCTTTCCAAT AACCACTAAA TTCGTAAGCA TAAACACTTT 
GGAATGACAT TTTTACCAAA GTCTGACATG CCAACCTTGC 
ATATTACGAA GGCGTTGCCA ATCAAAAATG TAGATTCCCA 
GGTTGAGCTG GTTTTTCTTC AAATTCAACA ATACGATTGT 
CCAAAACGGC TTGCTTCTTT AAGAGGGACG TCTAAAACTG 
TTATCCTTAT GAGACTGGAG CATATCATCA TAGTCCATTT 
AAAATCAAGA CATACTCAGG ATTGACACTG TCGATATAGT 
TGACTAGTCC CCTCAAACCA ACGATTTCCT TCACTTGCAG 



AATCTTTTGG TTTGGTGAGA 7260 

CGCTTGATAA TAAGATTTAA 7320 

GAAAGCTCCC TCTTTTACAG 7380 

TTTTTTAGCT TCTTCTTCCA 7440 

TGTAGACATA TTGAACGTTT 7500 

ATGGTCTGTT TCATCTACAT 7560 

TTTTTTATAA ACTACAGTGA 7620 

CAAATCAATG TTAATAAGAA 7680 

TTTCAAATAA GTAAGAAGCT 7740 

GGTATTGTAA ATTCCTAGAT 7800 

TGAACGAATA TGGTCAAATA 7860 

ACGAACACCT GCATTAGCAA 7920 

AAATGGCAAA CTTGCTACTG 7980 

TGTATTTCCT AAAATGGCAG 8040 

CATTATATCC TACAACTTGT 8100 

TCGCACCTTC ACCAATAATG 8160 

TCATGATAAC TGAATCAAGG 8220 

TAGAATGTTT AACAGTTCCA 8280 

GAGCATTTGC CCCGAGGAAG 8340 

GACGGTTACG ACTATCCAAG 8400 

GTGACTCAAT AGTACCAACA 8460 

CACCTGACTC AAGGTAATTT 8520 

TCTTTTCAGC AGCGACTAAC 8580 

TAGAAGCTTT TGTAGATTTA 8640 

TAGCATCTGT GTTCATGATA 8700 

CTACTGTCAA GCTGGCATTA 8760 

TGTAGATGTG ATCCCCAGAC 8820 

CGATATTTTG GTAAATAGCG 8880 

AATAAGGTTG AAGAATAGAG 8940 
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ACACCTGAAT TAATACCGTC TAGTCCCCAG CTTGAACCAT TCCCAATATG GTTGTTGAGA 9000 

GCAAGTGGTT GATACTGTGT AACGACCCCA ACATTGTGAA TCCCTGAGTT GGCACAGTTT 9060 

GATAGGGCAA AGTCAATGAT ACGGTAGCGC CCACCAAATT GCACAGCTGG TTTTGCGATG 9120 

CTTTGAGTGA GTTTACCGAG ACGAGTTCCT TGCCCACCAG CAAGAATCAA AGCTAACATT 9180 

TCATTTTTCA TTTTCTACTC CTTTTTGGTT TTTATTTGTG ACGGTTTTAG TAGATTTCAA 9240 

GCGACGTTTG ATTTTCCATA CACTTGCTCC CATAGCCGGT AGGGTAAAGG TTAAGGTCTG 9300 

CTCATAATCT TTCCATAGTC CTTCTTGCGT TTGAACAGTT TGATTATGTT CTTTCCAAAC 9360 

GCCTCCCCAC TCTTCCAACT CAGTATTCCA TACTTCTTCG TAAATTCCTG CAACGGGTAG 9420 

TCCGATTGTA AAATCTTTCC GCTCAACAGG TACCATATTA AAGATACAGA CTAACATTTC 9480 

TCCCTTTTTA CCCTTACGAA TAAAGGAAAG AACACTCTGG TCTCGATTAT CCGCATCAAT 9540 

GATTTCAATA CCATCATAGC TGGTATCAAT TTCCCACAGA CAGCGATGAT CTTTGTAAAA 9600 

CTGGTTTAGC TGAGAAGCGA AATACTTCAT CTTAGCATTC ATTGGGTCTT CTAGGTTAGA 9660 

CCATTCCAAC TGTTCTTCAG ATTTCCATTC TAGGAATTGA CCGTATTCGC TACCCATGAA 9720 

GAGCAATTTC TTACCAGGGT GACAAATTTG GTACGTATAG AGATTGCGCA AGCCTGCGAA 9780 

TTGATTGTAA CGATCTCCCC ACATCTTATG CATC AT ACT C TTCTTGCCAT GAACCACTTC 9840 

ATCGTGCGAG AATGGCAAGA GATAATTCTC CTTGAAAACA TACATAAAGC TGAAAGTCAC 9900 

CAGGTTAAAG TCATATTTAC GATAGATCGG ATCTTCTTCG TAGAAACGGA GGATATCATT 9960 

CATCCAGCCC ATGTTCCATT TGTAGTCAAA TCCTAGACCA CCAATCTCTT TCATTCCCGT 10020 

AATCTTGATC GCAGACGAAC TTTCTTCTGC AATCATCATC ACATCTGGAT ATTCTAACTT 10080 

AATAACCTCA TTCAAGCGCT GAAGGAAATA ATAACCTTCA TAGTTGAGAT TTCCGCCATC 10140 

TTTATTAGGT GTCCATGGAG CATCATCATA GTCCAAATAG AGCATGTTGC TAACAGCATC 10200 

CACACGAATA CCATCCAAAT GATAGACATC AATCCAATGC TTAATGCAAG AAATTAAGAA 10260 

GGACTGGACT TCATTTTTTC CAAGGTCAAA ATTAAGGGCA CCCCAACCAT GGTTATGAGC 10320 

CTTATTATGG TCTTGGTATT CAAAAGTCGG TGTCCCATCA TAATAGGCTA AGGCATCATC 10380 

GTTGATGGTA AAGTGACTGG TACCCAGTCC ACAATAACCC CAATATTATG GGTATGACAC 10440 

TCCTCGACAA AATCTTGAAA CTCCTCTGGT CGGCCATAAG CATGCTCTAA AGCGAAGTAA 10500 

CCCATAAGCT GATACCCCCA ACTCAAGCCC AAAGGATGGG ACATCAAGGG CATAAACTCA 10560 

ATATGAGTAT AGTTCATTTC AACGAGATAA GGAATGAGTT CATCCTTGAG CTGGGCAAAA 10620 

CTATAAGGAC TGCCATCAGA ATTTCTTTTC CATGATCCAG CGTGAACTTC ATAAATATTG 10680 

ACAGGACGCT CTTCAAAGCC CCAACGTTTT CTTCGTGCCA GCCAAAGTCC ATCCTTCCAT 10740 
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TTCTTCTCAG GAAGCTCTGT TACGATTGCC CCTGTTCCTG GACGAGCCTC ATACCTGACA 10800 

GCAAAAGGGT CAATCTTCAT CAGTTGATGA CCATTTTGAC GTGTGACATG ATATTTGTAA X0860 

ATATGCCCTT CTTGAGCCAT ATTGGTAAAG ACTTCCCAGA CCCCAAAATC ATTTCTTACC 10920 

ATTGGAATCT GATTTTCAAT CCAGTTGGTA AAATCACCAA CCAAGTGAAC AGCCTGAGCA 10980 

TTAGGTGCCC AAACACGGAA GGTATAGCCA TGCTCTCCAT TTAGTTCTTC CCTATGTGCT 11040 

CCTAGATAAT GTTGGAGATA AAAATTTTCA CCCGTCATAA AGGTTTTTAA TGCTTCTCTA 11100 

TTATCCATAT ACTCCCCTTC TCCTGTAAGC GTTTTCTATG TTTTTATTAT ACTACCTTTT 11160 

TAGAGAAGAT TCAAGTAAAT TACTATACTT CTTTAATTAT TTTGAAAATC TACAACAAGT 11220 

TCACTTACTC GTTCAATTGT AAATCAATAT TTTTTCAAAA AATTGCGAAA ACGCCTTTCT 11280 

TTTTCTACTA TAGTGAAATG AAATAAAACA TGCGCAAATC GATTAAGGAA TTTAATGTAA 11340 

TTTCTAACAA TGTCTTAGAA ATCAAAGTGT ACTATTTTAA CTCC 11384 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7577 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



TGTTGATTTG 


TTACTAGACG 


TTGACCAACG 


TCCTTCGGCT 


GGAAAAGGAA 


TTCTCCTTAG 


60 


TTTCCAACAC 


GTTTTCGCCA 


TGTTTGGTGC 


GACCATCTTG 


GTACCATTGA 


TTTTGGGAAT 


120 


GCCTGTATCT 


GTTGCCCTTT 


TTGCTTCAGG 


TGTTGGAACA 


CTCATCTACA 


TGATTGCTAC 


180 


TGGTTTTAAA 


GTTCCAGTTT 


ATCTAGGTTC 


TTCATTTGCC 


TTTATCACAG 


CTATGTCACT 


240 


GGCTATGAAA 


GAAATGGGGG 


GGGATGTATC 


TGCTGCCCAA 


ACAGGGGTTA 


TCTTGACTGG 


300 


TTTGGTCTAT 


GTCCTTGTTG 


CTACCAGCAT 


CCGATTTGTA 


GGAACAAAAT 


GGATTGATAA 


360 


ACTCTTGCCA 


CCAATCATTA 


TCGGTCCTAT 


GATCATCGTT 


ATCGGTCTTG 


GACTTGCAGG 


420 


TTCAGCTGTT 


ACCAATGCAG 


GTCTTGTAGC 


AGACGGAAAT 


TGGAAAAATG 


CTCTGGTAGC 


480 


CGTTGTTACT 


TTCCTAATTG 


CTGCCTTTAT 


CAATACAAAA 


GGAAAAGGCT 


TCCTACGAAT 


540 


CATTCCATTC 


CTCTTTGCCA 


TTATCGGTGG 


TTACCTTTTC 


GCACTAACTC 


TTGGCTTGGT 


600 


TGACTTTACA 


CCAGTTCTTA 


AAGCCAACTG 


GTTCGAAATT 


CCTGGTTTCT 


ACTTGCCATT 


660 


TAGCACAGGT 


GGTGCCTTTA 


AAGAGTACAA 


TCTTTACTTT 


GGTCCAGAAG 


CCATCGCTAT 


720 
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CTTGCCAATC GCTATCGTAA CAATTTCTGA ACATATCGGA GACCATACTG TTTTGGGTCA 780 

AATCTGTGGT CGTCAATTCT TAAAAGAACC AGGTCTTCAC CGTACTCTTC TTGGTGACGG 840 

TATCGCAACT TCTGTTTCTG CCTTCCTTGG TGGACCAGCC AATACAACTT ACGGAGAAAA 900 

TACAGGGGTT ATCGGTATGA CTCGTATCGC TTCTGTCTCA GTTATCCGTA ACGCTGCCTT 960 

CATCGCGATT GCCCTCAGCT TCCTTGGTAA ATTCACTGCC TTGATTTCAA CTATTCCAAA 1020 

CGCTGTACTT GGTGGTATGT CAATCCTTCT CTATGGGGTT ATCGCCAGCA ATGGTTTGAA 1080 

AGTCTTGATT AAAGAACGTG TTGATTTCGC TCAAATGCGA AACCTCATCA TCGCAAGTGC 1140 

TATGTTGGTT CTTGGACTTG GAGGAGCTAT CCTTAAACTT GGTCCAGTTA CACTTTCAGG 1200 

TACTGCCCTT TCAGCCATGA CAGGAATCAT CTTGAACTTG ATCTTGCCAT ACGAAAATAA 1260 

AGACTAAGAG TCTAAATACA CCTAATCCAC TCAGACAGCT GAGTGGATTT TTCGTATACC 1320 

ATAATAAAAG TGTCTTAACA AAATTATTAA AATCAAAAAA CGTATAATAT CAGATATTCT 1380 

AAAACCTTGA TACTGTACGT TTTATCATAG AAATTTTTAC TTTATTTTCT CATCAAATGA 1440 

GATTTGCATC AATCTCTTGT CTTACTTGCG TTTCTTCTTC GCTTTCTTCA TTTTGTTAGC 1500 

CATACGTTTC ATGGACTGTT TCATGGCAAA TTCACCAATT TTACCTTTCA AACCGCCACC 1560 

AAACATCTGG CTCATATCTG GCATTCCTGC TCCTCCGAGA GCTGATAAGT CAGGCATACC 1620 

GCCTTGTCCC ATCATTCCTT CAAGGGCAGA CATATCCATT CCTCCCATAT TTGGCATATT 1680 

TTTAGGAAGG TTATTTGGAT TAATCCCCAT TTGCTTCATC ATTTTATTCA TATCCCCAGA 1740 

CATAACACCC TGCATGAGCT GTTTAGCCTG GTTAAAGTCC TTGATGAATT TATTGACTTC 1800 

GACGAATGTA TTTCCAGAAC CAGCAGCAAT ACGACGGCGA CGGCTTGGAT TTAACAAATC I860 

TGGGTTTTCA CGCTCTTCAG GTGTCATCGA AGACACAATG GCACGTTTAC GAGCAATCTG 1920 

GCGTTCATCC ACCTTCATGT TTTGAAGGGC TGGATTGTTG GCCATACCTG GAATCATCTT 1980 

GAGCAAGTCT TCCATCGGCC CCATATTTTG CACCTGATCT AATTGATCGA TGAAATCATT 2040 

AAAATCAAAG GTGTTTTCGC GCATCTTCTC AGCCATTTCA AGGGCTTTTT GTTCATCGTA 2100 

TTCCTGAGAA GCTTTCTCAA TCAAAGTGAG CATATCCCCC ATACCAAGGA TACGGCTAGA 2160 

CATGCGGTCT GGGTGGAAGG TTTCAATGTC CGTAATCTTT TCACCTGTAC CAGTGAACTT 2220 

GATTGGTTTT CCAGTAATGT GACGAACAGA CAGAGCAGCA CCACCACGAG TATCGCCATC 2280 

AATCTTGGTA AGGATGACCC CAGTCACTTC CAACTGAGCA TTAAACTCAC GCGCAACATT 2340 

GGCTGCTTCC TGACCAATCA TAGCATCAAC GACAAGCAAG ATTTCATTTG GTTGAGCCAA 2400 

TGCTTTCACA TCACGAAGCT CATTCATGAG GAGCTCATCA ATCTGCAAAC GACCCGCAGT 2460 

ATCAATCAAG ACATAGTCGT TATGATTAGT TTGGGCTTGC TCCAAACCTT GACGTACAAT 2520 
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CTCAACAGCT GGTACTTCTG TTCCAAGTGC AAAGACAGGC ACATCAATCT GTTGTCCCAA 2580 

GGTCTTAAGC TGGTCAATGG CAGCTGGACG ATAAATATCC GCCGCAATCA TCAAAGGACG 2640 

AGCATTTTCT TCTTTCTTGA GTTTGTTGGC CAATTTACCA GCAAAGGTTG TTTTACCAGC 2700 

CCCTTGTAAA CCAACCATCA TGATGATGGT TGGAATCTTA GGTGACTTGA TAATTTCTGC 2760 

CGTATCAGAA CCTAAAACGG CTGTCAATTC CTCATCAACG ATTTTAATAA TCTGTTGCGC 2820 

AGGATTAAGT GTATCAATGA CCTCATGCCC GACTGCACGC TCACGAACTT TCTTGATAAA 2880 

GTCCTTTACA ACAGGCAAGG CAACGTCGGC CTCGAGCAAG GCCAAGCGAA TTTCTTTGGT 2940 

TGCCTCTTGG ACATCAGATT CAGAGATTTT TCCTTTTTTA CGTAGATTTT TAAAGACGTT 3000 

CTGCAAACGT TCTGTTAAAC TTTCAAATGC CATTTTTCTT CCTCTTATTC TCTATTATCA 3060 

ATGCTTGTTA AAATTTCTAT CTGCTCCTGC AGAAAGTCAT CCTTGGGATA GCGCTCCAAA 3120 

ATCTGATCAA AAATCTGACT GCGGACAATA TAGTCCGAGT ACATGTGCAA TTTCATCTCA 3180 

TAATCTTCCA GAATCTTTTC TGTTCGCTTG ATATTGTCAT AGACAGCCTG ACGACTGACA 3240 

CCGAACTCCT CGGCAATTTC AGCAAGGCTG TAATCATCAG CGTAGTAGAG CTCGATATAA 3300 

TTCATTTGCT TATCTGTCAA AAGCGCCGCA TAAAATTCAA AGAGCGCATT CATACGATTG 3360 

GTTTTTTCGA TTTCCATAAC TTTTATTATA CCAAAAATTA GCCTAATCTA CCACACTAGG 3420 

AAGCCGATCC AAGAAGATAG ATAGCTAAAT TTGAAAAAGA CATGAGCCTA GCCCCAAGTA 3480 

ATTTCCAATT GATAGCTGGC AAAGGGATGT CCCTCTTGAT TTTGTAGTTG ATAATCTAGT 3540 

TCAATCTTTT GCCTATCAAC TTGATAATGG CTCGTTTGGA TGATAAACTC CTGCATGCCC 3600 

ATAGGTGTAG GAATATAGGC TAAACTATCG CTATCCTTTA GAAAGCGCAT AATGGTCTTG 3660 

GGATTAGAAA ATCGGCTCAT CACAAGTTCT TGACCATGAA ATTTAATCAC TACTTTTTCC 3720 

TTTTCCTCAT TATAGAAAAG CAGGTAGCTA TAATCTCCTT TTTCATGCAC TTCCACATCA 3780 

TAAAGCTGGT CAATCACTTC CAACTGCTCA TCAAACTGAA TCGTATTTCG CATCCGAATC 3840 

TTCACATCAG GCCCTCTTTC TTGTCTCTTG TCCTACTATT TTACCAAAAA GAGCAGGATT 3900 

TTGCTATAAT GGTCATATGA ACGAAAAAGT ATTCCGTGAC CCTGTTCACA ACTACATCCA 3960 

TGTCAATAAT CAAATCATCT ATGACTTGAT TAATACAAAA GAATTTCAGC GTTTGCGCCG 4020 

GATCAAACAA CTGGGAACTT CCAGTTATAC CTTCCACGGT GGAGAACACA GTCGCTTCTC 4080 

TCACTGTCTA GGAGTCTATG AAATTGCACG ACGCATCACA GAGATTTTCG AAGAAAAATA 4140 

TCCTGAGGAA TGGAATCCTG CCGAGTCTCT CTTGACCATG ACCGCTGCTC TCCTACACGA 42O0 

CCTTGGGCAT GGTGCCTACT CCCATACTTT TGAACATCTC TTTGATACAG ACCATGAAGC 4260 
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CATTACTCAG 


GAGATTATTC 


AAAATCCTGA 


GACAGAGATT 


CACCAAGTCC 


TGCTACAAGT 


4320 


GGCACCTGAT 


TTCCCAGAAA 


AGGTGGCCAG 


TGTCATTGAC CATACCTATC 


CTAATAAGCA 


4380 


GGTCGTGCAG 


CTCATTTCTA 


GTCAGATTGA 


CGCAGATCGC 


ATGGACTATC 


TCTTGCGCGA 


4440 


CTCCTATTTT 


ACAGGAGCAT 


CCTATGGGGA 


ATTTGACCTG 


ACTCGAATCC 


TCCGAGTCAT 


4500 


TCGTCCTATC 


GAAAATGGTA 


TCGCCTTTCA 


GCGCAATGGC 


ATGCACGCCA 


TCGAAGACTA 


4560 


CGTCCTCAGT 


CGCTACCAGA 


TGTACATGCA 


GGTTTATTTC 


CACCCCGCAA 


CACGCGCCAT 


4620 


GGAAGTTCTC 


CTACAGAATC 


TTCTCAAACG 


CGCCAAGGAA 


CTCTATCCTG 


AGGACAAGGA 


4680 


TTTCTTTGCC 


CGAACTTCTC 


CACACCTCCT 


GCCTTTCTTC 


GAAAAAAATG 


TGACCTTGAC 


4740 


TGACTATCTG 


GCTCTGGATG 


ATGGCGTGAT 


GAATACCTAC 


TTCCAGCTTT 


GGATGACCAG 


4800 


TCCTGACAAG 


ATTCTTGCAG 


ATTTATCGCA 


TCGCTTTGTC 


AACCGCAAGG 


TCTTTAAATC 


4860 


CATTACCTTT 


TCACAAGAGG 


ACCAAGATCA 


ACTTACTAGC 


ATGAGAAAAT 


TGGTTGAGGA 


4920 


TATCGGCTTT 


GATCCCGACT 


ACTACACTGC 


CATTCATAAG 


AACTTTGACC 


TCCCTTATGA 


4980 


TATCTATCGT 


CCCGAATCTG 


AAAACCCACG 


GACAGAGATT 


GAGATTTTAC 


AAAAAAATGG 


5040 


AGAACTGGCC 


GAACTCTCTA 


GCCTGTCTCC 


TATCGTCCAA 


TCCCTTGCTG 


GCAGTCGCCA 


5100 


CGGAGATAAT 


CGCTTTTATT 


TTCCAAAAGA 


AATGTTGGAC 


CAAAACAGCA 


TCTTTGCAAG 


5160 


CATTACCCAG 


CAATTTTTAC 


ACTTGATTGA 


GAACGATCAT 


TTTACCCCAA 


ATAAAAACTA 


5220 


GAAGAGGAAA 


TTTATGAGTA 


TTAAACTAAT 


TGCCGTTGAT 


ATCGACGGAA 


CCCTTGTCAA 


5280 


CAGCCAAAAG 


GAAATCACTC 


CTGAAGTTTT 


TTCTGCCATC 


CAAGATGCCA 


AAGAAGCTGG 


5340 


TGTCAAAGTC 


GTGATTGCAA 


CTGGCCGCCC 


TATCGCAGGC 


GTTGCCAAAC 


TTCTAGACGA 


5400 


CTTGCAGTTG 


AGAGACGAGG 


GGGACTATGT 


GGTAACCTTC 


AACGGTGCCC 


TTGTCCAAGA 


5460 


AACTGCTACA 


GGACATGAGA 


TTATCAGCGA 


ATCCTTGACT 


TATGAGGATT 


ATCTAGATAT 


5520 


(ZTZJi R TTPPTT 

V7Vjt/U\ 1 1 V_l^ IT— 


no i. 




CATGCATGCC 


ATTACCAAGG 


ACGGTATCTA 


5580 


TACTGCAAAT 


CGCAATATCG 


GAAAATACAC 


TGTACACGAA 


TCAACCCTCG 


TCAGCATGCC 


5640 


TATCTTCTAC 


CGTACCCCTG 


AAGAAATGGC 


TGGCAAAGAA 


ATTGTTAAAT 


GTATGTTTAT 


5700 


CGATGAACCA 


GAAATTCTCG 


ATGCTGCGAT 


TGAAAAAATT 


CCAGCAGAAT 


TTTACGAGCG 


5760 


CTACTCCATC 


AACAAATCTG 


CTCCTTTCTA 


CCTCGAACTC 


CTTAAAAAGA 


ATGTAGACAA 


5820 


GGGTTCAGCC 


ATTACTCACT 


TGGCTGAAAA 


ACTCGGATTG 


ACCAAAGATG 


AAACCATGGC 


5880 


AATCGGTGAT 


GAAGAAAATG 


ACCGTGCCAT 


GCTGGAAGTC 


GTTGGAAACC 


CCGTTGTCAT 


5940 


GGAAAATGGA 


AATCCAGAAA 


TCAAAAAAAT 


CGCCAAATAC 


ATCACCAAAA 


CAAATGACGA 


6000 


ATCCGGCGTT 


GCCCATGCCA 


TCCGAACATG 


GGTACTGTAA AAGTATCATT 


TTTCAATAAG 


6060 
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AATTGATTAG CAATAAAATC CAATGAATTT TTTTAGCAAA CTATTTAATT TAAAACAAAA 6120 

TAATCATAAT AGAGACACAA ATTCTGATTG TAACAATTTT TACCTAAACG AATTAGAATG 6180 

TGGCCTTACT CCTGGGCAAC TCATACTCAT AGATTGGACT CAAAAAACAG GGAGAAATTA 6240 

TAATTTCCCA AGATATTTTA AATACTCTCT TCAAATTGAC CCTGAATCTA CACACAATCA 6300 

ATTATACAAA TTAGGATACT TCACTAAAAA TAAGACTTTA TCATATCTTA CAGTAGTAGA 6360 

ATTAAAAACT ATATTATCTA AACATAATTT AGCTACTTCT GGAAAAAAAG CAGAATTAAT 6420 

TACAAGAATA ATTAATAATG TTAACATTGA CAATTTAGAT ATTCCGTTCG AATTTAAACT 6480 

AACAAAAGAA GCACAAAATC TTATTATCGA ACATAGTGAC TATATCAAAG CATACTATGA 6540 

TAAAGACATA ACTATGGAAG ATTATTGTAA AGAAAAAAAC AATATCTCTT TTAAAGCAAC 6600 

TTTTGGTGAT ATAAAATGGA GTCTCTTAAA TAAACAAGCT CATAGGAATA CTGTATCAGG 6660 

AGATTTTGGA TGCTTATCTA ACACACGAAA GGCTCAGGGA AGACATTTGG AACAAGAAGG 6720 

TAATATTAAA CATGCTTTAA TATATTACAT AGAATCTTTG ATAATTACTA TTTCAGGATT 6780 

AGAAAACAAT TTTTCAGCCA CTGATTATCC AGTATATTAT CCCGATTCGA TACCTGACTA 6840 

CTCACTAAAA CATATTCAAA CATTAATGGA ATCATTATCT GATGACGATT ATGATTTTGC 6900 

TTTTGATGAA GCATTATTTC GCTTCTCAAT TTTGAATGCA AATCATTTTT TATCTAAGGA 6960 

AGATATTGAC TATTTAAGAG TTAATTTACC TCGTTCCACT GCTGAAGAAA TAAACAATTA 7020 

CTTAAAGAAA TATGAATGTT ATAGTCCTTT AAATAATTTA GAACTTGACG ATTTTGAATA 7080 

AATTGACTAT ACAAACATTT ATATACTCGA TATAGTCTCA ATTTTATCTG ATGATTGCCC 7140 

AAATTTTTCA ATAATAAAAC GCATAATATT ATGGAGACAA TCCCCTATAT TATGCGTTCT 7200 

TTTAATATCA AAGACTTTTT GACAAACTTC TTTGATATCT AATTACATGC CCCCTGCAGG 7260 

AATCGAACCT GCAACTACTC CTTAGGAGGG AGTTGTTATA TCCATTGAAC TAAGGGAGCT 7320 

AGATAAAAAC TCTGCTAAAT GAGCAGAGTT TTTTAGTCGA ATTAACGACG GATTTCTTTG 7380 

ATACGAGCTG CTTTACCTTG AAGAGCACGC AAGTAGTACA ATTTCGCACG ACGTACTTTA 7440 

CCGTAACGAA CAACTTCGAT TTTTTCAACA CGTGGAGTGT GGATTGGGAA GATACGCTCA 7500 

ACACCTACAC CGTTAGAGAT TTTACGAACT GTGTAGTTTT CTGAGATTCC AGCACCTTTA 7560 

CGTGCGATAA CAACACG 7577 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4945 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
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(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

CCTCGCTGAT GATTGGTGCT GTTTTATTTG CTGGTCCAGC CTTGGCTGAA GAAACTGCAG 60 

TTCCTGAAAA TAGCGGAnCT AATACAGAGC TTGTTTCAGG AGAGAGTGAG CATTCGACCA 120 

ATGAAGCTGA TAAGCAGAAT GAAGGGGAAC ATGCTAGAGA AAACAAGCTA GAAAAGGCAG 180 

AAGGAGTAGC GATAGCATCT GAAACTGCTT CGCCAGCAAG CAATGAAGCT GCAACTACTG 240 

AAACTGCAGA AGCAGCTAGC GCAGCTAAAC CAGAGGAAAA AGCAAGTGAG GTGGTTGCAG 300 

AAACACCATC TGCAGAAGCA AAACCTAAGT CTGACAAGGA AACAGAAGCA AAGCCCGAAG 360 

CAACTAACCA AGGGGATGAG TCTAAACCAG CAGCAGAAGC TAATAAGACT GAAAAAGAAG 420 

TCCAGCCAGA TGTCCCTAAA AATACAGAAA AAACATTAAA ACCAAAGGAA ATCAAATTTA 480 

ATTCTTGGGA AGAATTGTTA AAATGGGAAC CAGGTGCTCG TGAAGATGAT GCTATTAACC 540 

GCGGATCTGT TGTCCTCGCT TCACGTCGGA CAGGTCATTT AGTCAATGAA AAAGCTAGCA 600 

AGGAAGCAAA AGTTCAAGCC TTATCAAACA CCAATTCTAA AGCAAAAGAC CATGCTTCTG 660 

TTGGTGGAGA AGAGTTCAAG GCCTATGCTT TTGACTATTG GCAATATCTA GATTCAATGG 720 

TCTTCTGGGA AGGTCTCGTA CCAACTCCTG ACGTTATTGA TGCAGGTCAC CGTAACGGGG 780 

TTCCTGTATA CGGTACACTC TTCTTCAACT GGTCTAATAG TATTGCAGAT CAAGAAAGAT 840 

TTGCTGAAGC TTTGAAGCAA GACGCAGATG GTAGCTTCCC AATTGCCCGT AAATTGGTAG 900 

ACATGGCCAA GTATTATGGC TATGATGGCT ATTTCATCAA CCAAGAAACA ACTGGAGATT 960 

TGGTTAAACC TCTTGGAGAA AAGATGCGCC AGTTTATGCT CTATAGCAAG GAATATGCTG 1020 

CTAAGGTAAA CCATCCAATC AAGTATTCTT GGTACGATGC CATGACCTAT AACTATGGAC 1080 

GTTATCATCA AGATGGTTTG GGAGAATACA ACTACCAATT CATGCAACCA GAAGGAGATA 1140 

AGGTTCCGGC AGATAACTTC TTTGCTAACT TTAACTGGGA TAAGGCTAAA AATGATTACA 1200 

CTATTGCAAC TGCCAACTGG ATTGGTCGTA ATCCTTATGA TGTATTTGCA GGTTTGGAAT 1260 

TGCAACAGGG TGGTTCCTAC AAGACAAAGG TTAAGTGGAA TGACATTTTA GACGAAAATG 1320 

GGAAATTGCG CCTTTCTCTT GGTTTATTTG CCCCAGATAC CATTACAAGT TTAGGAAAAA 1380 

CTGGTGAAGA TTATCATAAA AATGAAGATA TCTTCTTTAC AGGTTATCAA GGAGACCCTA 1440 

CTGGCCAAAA ACCAGGTGAC AAAGATTGGT ATGGTATTGC TAACCTAGTT GCGGACCGTA 1500 

CGCCAGCGGT AGGTAATACT TTTACTACTT CTTTTAATAC AGGTCATGGT AAAAAATGGT 1560 

TCGTAGATGG TAAGGTTTCT AAGGATTCTG AGTGGAATTA TCGTTCAGTA TCAGGTGTTC 1620 



WO 98/18931 



PCT/US97/19588 



429 

TTCCAACATG GCGCTGGTGG CAGACTTCAA CAGGGGAAAA ACTTCGTGCA GAATATGATT 1680 

TTACAGATGC CTATAATGGC GGAAATTCCC TTAAATTCTC TGGTGATGTA GCCGGTAAGA 1740 

CAGATCAGGA TGTGAGACTT TATTCTACTA AGTTAGAAGT AACTGAGAAG ACCAAACTTC 1800 

GTGTTGCCCA CAAGGGAGGA AAAGGTTCTA AAGTTTATAT GGCATTCTCT ACAACTCCAG 1860 

ACTACAAATT CGATGATGCA GATGCATGGA AAGAGCTAAC CCTTTCTGAC AACTGGACAA 1920 

ATGAAGAATT TGATCTTAGC TCACTAGCGG GTAAAACCAT CTATGCAGTC AAACTATTTT 1980 

TCGAGCATGA AGGTGCTGTA AAAGATTATC AGTTTAACCT AGGACAATTA ACTATCTCGG 2040 

ACAATCACCA AGAGCCACAA TCGCCGACAA GCTTTTCTGT AGTGAAACAA TCTCTTAAAA 2100 

ATGCCCAAGA AGCGGAAGCA GTTGTGCAAT TTAAAGGCAA CAAGGATGCA GATTTCTATG 2160 

AAGTTTATGA AAAAGATGGA GACAGCTGGA AATTACTAAC TGGCTCATCT TCTACAACTA 2220 

TTTATCTACC AAAAGTTAGC CGCTCAGCAA GTGCTCAGGG TACAACTCAA GAACTGAAGG 2280 

TTGTAGCAGT CGGTAAAAAT GGAGTTCGTT CAGAAGCTGC AACCACAACC TTTGATTGGG 2340 

GTATGACTGT AAAAGATACC AGCCTACCAA AACCACTAGC TGAAAATATC GTTCCAGGTG 2400 

CAACAGTTAT TGATAGTACT TTCCCTAAGA CTGAAGGTGG AGAAGGTATT GAAGGTATGT 2460 

TGAACGGTAC CATTACTAGC TTGTCAGATA AATGGTCTTC AGCTCAGTTG AGTGGTAGTG 2520 

TG GAT ATT CG TTTGACCAAG CCACGTACCG TTGTTAGATG GGTCATGGAT CATGCAGGAG 2580 

CTGGTGGTGA GTCTGTTAAC GATGGCTTGA TGAACACTAA AGACTTTGAC CTTTATTATA 2640 

AAGATGCAGA TGGTGAGTGG AAGCTAGCTA AGGAAGTCCG TGGTAACAAA GCACACGTGA 2700 

CAGATATCAC TCTTGATAAA CCAATCACTG CTCAAGACTG GCGCTTGAAT GTTGTCACTT 2760 

CTGACAATGG AACTCCATGG AAGGCTATTC GTATCTATAA CTGGAAAATG TATGAAAAGC 2820 

TTGATACTGA GAGTGTCAAT ATTCCGATGG CCAAGGCTGC AGCCCGTTCT CTAGGCAATA 2880 

ACAAGGTACA AGTTGGCTTT GCAGATGTAC CGGCTGGAGC AACTATTACC GTTTATGATA 2940 

ATCCAAATTC TCAAACTCCG CTCGCAACCT TGAAGAGCGA AGTTGGAGGA GACCTAGCAA 3000 

GTGCACCATT GGATTTGACA AATCAATCTG GTCTTCTTTA TTATCGTACC CAGTTGCCAG 3060 

GCAAGGAAAT TAGTAATGTC CTAGCAGTTT CCGTTCCAAA AGATGACAGA AGAATCAAGT 3120 

CAGTCAGCCT AGAAACAGGA CCTAAGAAAA CAAGCTACGC CGAAGGGGAG GATTTGGACC 3180 

TTAGAGGTGG TGTTCTTCGA GTTCAGTATG AAGGAGGAAC TGAGGACGAA CTCATTCGCC 3240 

TAACTCACGC AGGTGTATCA GTATCAGGTT TTGATACGCA TCATAAGGGA GAACAGAATC 3300 

TTACTCTCCA ATATTTGGGA CAACCGGTAA ATGCTAATTT GTCAGTGACT GTCACTGGCC 3360 
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AAGACGAAGC AAGTCCGAAA ACTATTTTGG GAATTGAAGT AAGTCAGGAA CCGAAAAAAG 3420 

ATTACCTAGT TGGTGATAGC TTAGACTTGT CTGAAGGACG CTTTGCAGTG GCTTATAGCA 3480 

ATGACACCAT GGAAGAACAT TCCTTTACTG ATGAGGGAGT TGAAATTTCT GGTTACGATG 3540 

CTCAAAAGAC TGGTCGTCAA ACCTTGACGC TTCATTACCA AGGCCATGAA GTTAGCTTTG 3600 

ATGTTTTGGT ATCTCCAAAA GCAGCATTGA ACGATGAGTA CCTCAAACAA AAATTAGCAG 3660 

AAGTTGAAGC TGCTAAGAAC AAGGTGGTCT ATAACTTTGC TTCATCAGAA GTAAAAGAAG 3720 

CCTTCTTGAA AGCAATTGAA GCGGCCGAAC AAGTGTTGAA AGACCATGAA ACTAGCACCC 3780 

AAGATCAAGT CAATGACCGA CTTAATAAAT TGACAGAAGC TCATAAAGCT CTGAATGGTC 3840 

AAGAGAAATT TACGGAAGAA AAGACAGAGC TTGATCGCTT AACAGGTGAG GTTCAAGAAC 3900 

TCTTGGCTGC CAAACCAAAC CATCCTTCAG GTTCTGCCCT AGCTCCGCTT CTTGAGAAAA 3960 

ACAAGGCCTT GGTTGAAAAA GTAGATTTGA GTCCAGAAGA GCTTACAACA GCGAAACAGA 4020 

GTCTAAAAGA TCTGGTTGCT TTATTGAAAG AAGACAAGCC AGCAGTCTTT TCTGATAGTA 4080 

AAACAGGTGT TGAAGTACAC TTCTCAAATA AAGAGAAGAC TGTCATCAAG GGTTTGAAAG 4140 

TAGAGCGTGT TCAAGCAAGT GCTGAAGAGA AGAAATACTT TGCTGGAGAA GATGCTCATG 4200 

TCTTTGAAAT AGAAGGTTTG GATGAAAAAG GTCAAGATGT TGATCTCTCT TATGCTTCTA 4260 

TTGTGAAAAT CCCAATTGAA AAAGATAAGA AAGTTAAGAA AGTATTTTTC TTACCTGAAG 4320 

GCAAAGAGGC AGTAGAATTG GCTTTTGAAC AAACGGATAG TCATGTTATC TTTACAGCAC 4380 

CTCACTTTAC TCATTATGCC TTTGTTTATG AATCTGCTGA AAAACCACAA CCTGCTAAAC 4440 

CAGCACCACA AAACACAGTC CTTCCAAAAC CTACTTATCA ACCGACTTCT GATCAACAAA 4500 

AGGCTCCTAA ATTGGAAGTT CAAGAGGAAA AGGTTGCCTT TCATCGTCAA GAGCATGAAA 4560 

ATACTGAGAT GCTAGTTGGG GAACAACGAG TCATCATACA GGGACGAGAT GGACTGTTAA 4620 

GACATGTCTT TGAAGTTGAT GAAAACGGTC AGCGTCGTCT TCGTTCAACA GAAGTCATCC 4680 

AAGAAGCGAT TCCAGAAATT GTTGAAATTG GAACAAAAGT AAAAACAGTA CCAGCAGTAG 4740 

TAGCTACACA GGAAAAACCA GCTCAAAATA CAGCAGTTAA ATCAGAAGAA GCAAGCAAAC 4800. 

AATTGCCAAA TACAGGAACA GCTGATGCTA ATGAAGCCCT AATAGCAGGC TTAGCCAGCC 4860 

TTGGTCTTGC TAGTTTAGCC TTGACCTTGA GACGGAAAAG AGAAGATAAA GATTAAATAT 4920 

CGAAAAATCT TGTGAAATCT TTCCG 4945 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25002 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

GACAACTCAA GTAGCTTTTT CTTATTTTGA AAAAGGAGAT CAGAGTTTAA CTATGTCAGA 60 

AAAATCACAA TGGGGGTCGA AACTTGGTTT TATTCTAGCA TCTGCTGGCT GGCCATCGGG 120 

CTTGGTTCCG TTTGGAAGTT TCCCTACATG ACTGCTGCTA ATGGCGGTGG AGGCTTTTTA 180 

CTAATCTTTC TCATTTCCAC TATTTTAATC GGTTTCCCTC TCCTGCTGGC TGAGTTTGCC 240 

CTTGGCCGTA GTGCTGGCGT TTCCGCTATC AAAACCTTTG GAAAACTGGG CAAGAATAAC 300 

AAGTACAACT TTATCGGTTG GATTGGCGCC TTTGCCCTCT TTATCCTCTT ATCTTTTTAC 360 

AGTGTTATCG GAGGATGGAT TCTAGTCTAT CTAGGTATTG AGTTTGGGAA ATTGTTCCAA 420 

CTTGGTGGAA CGGGTGATTA TGCTCAGTTA TTTACTTCAA TCATTTCAAA TCCAGCCATT 480 

GCCCTAGGAG CTCAAGCGGC CTTTATCCTA TTGAATATCT TCATTGTATC ACGTGGGGTT 540 

CAAAAAGGGA' TTGAAAGAGC TTCGAAAGTC ATGATGCCCC TGCTCTTTAT CGTCTTTGTT 600 

TTTATCATCG GTCGCTCTCT CAGTTTGCCA AATGCCATGG AAGGGGTTCT TTACTTCCTC 660 

AAACCAGACT TTTCAAAACT GACTAGCACT GGTCTCCTCT ATGCTCTGGG ACAATCTTTC 720 

TTTGCCCTCT CACTAGGGGT TACAGTCATG TTGACCTATG CTTCTTACTT AGACAAGAAA 780 

ACCAATCTAG TCCAGTCAGG AATCTCCATC GTAGCCATGA ATATCTCGAT ATCCATCATG 840 

GCAGGTCTAG CCATTTTCCA AGCTCGATCC CCCTTCAATA TCCAGTCTGA AGGGGGACCC 900 

AGCCTGCTCT TTATCGTCTT GCCTCAACTC TTTGACAAGA TGCCTTTTGG AACCATTTTC 960 

TACGTCCTCT TCCTCTTGCT CTTCCTTTTT GCGACAGTCA CTTTTTCTGT CGTGATGCTG 1020 

GAAATCAATG TAGACAATAT CACCAACCAG GATAACAGCA AACGTGCCAA ATGGAGTGTT 1080 

ATTTTAGGAA TTTTGACCTT TGTCTTTGGC ATTCCTTCAG CCCTATCTTA CGGTGTCATG 1140 

GCGGATGTTC ACATTTTTGG TAAGACCTTC TTTGACGCTA TGGACTTCTT GGTTTCCAAT 1200 

CTCCTCATGC CATTTGGAGC TCTCTACCTT TCACTTTTTA CAGGCTATAT CTTTAAAAAG 1260 

GCTCTTGCAA TGGAGGAACT CCATCTCGAT GAAAGAGCAT GGAAACAAGG ACTGTTCCAA 1320 

GTCTGGCTCT TCCTTCTTCG TTTCTTCGTT TCGTCATTCC AATCATCATC ATTGTGGTCT 1380 

TCATTGCCCA ATTTATGTAA TCAAAAAGGA CTTGAGTAGT GAACTCAGGC CCTTTCTTTT 1440 

TATGGATGGC TAACAATCAA TTCCAAACCT TGCCCTTCCA GAGTCCAAGC TTCAACATCA 1500 

CTTGGTAGGA TAAAGTGGCT GCCTTTTTGA ATTGGATAAT TTTTCCCGTC AACAGTTAGC 1560 
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TGACCTTGAC CAGCCAAGAC ACTCAATAAG CTGTAGTCAG CTGTCTTTTC AAAGTCAACT 1620 

TTTCCAGTAA TTTCCCACTT GTAAACTGCG AAGAAATCAT TAGATACAAG GAGAGTGGAA 1680 

CGCAAATCAT CTGCTTTAAC AGTTACAGGA CGGCTATTTG CTGGCTCACC AATGTTCAAG 1740 

ACATCGATGG ATTTTTCAAG ATGAAGTTCA CGCAAGTTGC CTTTGTCATC CTTGCGGTCA 1800 

AAGTCATAGA CGCGATAGGT GGTATCGCTA GACTGCTGGG TTTCAAGGAT TAAGATACCC I860 

GCCCCGATAG CGTGCATAGT CCCGCTTGGT ACATAGAAGA AATCTCCAGC CTTAACAGGG 1920 

ACTTTGGTCA ACAAGTCATC CCAGTTCTTG TCCTCGATTT GCTGGCGGAG TTCTTCTTTT 1980 

GACTTGGCAT TGTGACCGTA GATAATCTCT GAACCTTCAT CCGCTGCGAT AATGTACCAG 2040 

CATTCTGTTT TTCCGAGTTC GCCTTCATGC TCGAGTCCAT AAGCATCGTC TGGGTGAACT 2100 

TGGACACTGA GCCAGTCGTT GGCATCGAGG ATCTTGGTCA AAAGTGGAAA TACAGGTTCT 2160 

GGACGATTGC CAAATAATTC ACGGTGTTCC GCATACAAAG TAGCAAGATC TGTTCCCTCG 2220 

TAACGACCAT TGGCAACTTT AGAGACTCCA TTTGGATGGG CTGAGATGGC CCAATATTCT 2280 

CCGATTTTTT CACTTGGGAT GTCGTAGCCA AACTCATCAC GTAGCTTGGC TCCACCCCAG 2340 

ATTTTTTCTT GCATAACTGA TTGTAAAAAT AATGGTTCTG ACATGTCGAT CTCCTGTCTG 2400 

ATTTTTCTCC CCTCATTATA GCAAAAAAAG AGTTCGAATT GAACTCTTTT TTACATCTTA 24 60 

TAAAGCAGGG AGAAGATTTT ATAAAAATAG TAAACAAATG TGCTCTACCC GATGCTTGCA 2520 

CCATTGCTAT AAATGACATC CTTGTACCAA TAGAAGGACT TCTTCTTGCT ACGTTTGAGA 2580 

GCTCCGTTTC CTACATTATC TCGATCTACA TAGATAAAGC CATAGCGCTT ATTCATTTCC 2640 

CCTGTGCCAG CTGAAACCGG ATCGATACAG CCCCAAGTCG TATAACCAAG CAAGTCAACC 2700 

CCGTCTTGGT AAATGGCATC TCGCATGGCC TTGATGTGGG CCTCTAAGTA AGTAATCCGA 2760 

TAGTCATCTG CTACATAACC ATTCTCATCC GGTGTATCCA TAGCACCGAG TCCATTTTCT 2820 

ACGATAATAC TAAACTAAAA TCAAAAAGCA TTATATAATA GTGATATGAA ATCAACTAAA 2880 

GAAGAAATCC AAACCATCAA AACACTTTTA AAAGACTCTC GTACAGCTAA ATATCATAAA 2940 

CGCCTTCAAA TCGTTCTATA GTAAAATGAA ATAAGAACAG TACAAATCGA TCAGGACAGT 3000 

CAAATCGATT TCTAACAATG TTTTAGAAGT AGGGGTGTAC TATTCTAGTT TCAATCTACT 3060 

ATATTTCGTC TGATGGGCAA ATCTTATAAA GAGATTATAG AACTTTTATA GTAGTTTGAA 3120 

ATAAGATGTG AACAACTCTA TCAGGAAAGT CAAATTAATT TATAGAAATA TTTTAGCAGC 3180 

CAAGGTGTAC TGTTATAGAT TCAATACACT ATAGACTGTA ATCAAACAAC GATTTGGCGA 3240 

AATGTAAAAA AATATGAGGA GTTCGGACTC GACTCTCTCC TTCAAGAAAC ACGTGGTGGT 3300 

CGTAACCATG CATATATGAC AGTTGAGGAA GAGAAAGCCT TTCTTGCCCG CCATTTGAAG 3360 
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GCTACAGAGG CAGGAGAATT TGTTACAATT GATGCCTTAT TTCAGGCTTA TAAAAAGGAG 3420 

TTAGGTCGTT CCTACACACG TGATGCCTTC TATCAACTGT TGAAGCGCCA TGGTTGGCGA 3480 

AATATTACGC CACGTCCAGA ACATCCTAAG AAAGCAGACG CTCAAACCAT TGTTGCGTCT 3540 

AAAAATAAAA TCTCAATCCA AGAAGGCAAG AAAGCGTTTT AAATATAGTA GACGTTTTCG 3600 

TAAGGTTTGC TTGATGTACC AAGCTGAAGC TGGTTTCGGT AGAATCAGTA AACTGGGATC 3660 

TTGTTGGGCT CCAATAGGAG TAGGTCCACA TATCCATAGT CACTATATAC GAGAATTTCG 3720 

CTATTGTTAT GGAGCTGTTG ATGCCTATAC AGGCGAATCA TTTTTCTTAA TAGCTGGTAG 3780 

ATGTAATACT GAGTGGATGA ACGCCTTTTT AGAAGAGCTT TCACAAGCTT ATCCTTTTAC 3840 

TCGTTATGGA CAATGCTATA TGGCATAAAT CAAGTACCTT AAAGATTCCG ACTAATATTG 3900 

GTTTTGCATT TATTCCTCCA TACACACCAG AGATGAACCC CATTGAACAA GTGTGGAAAG 3960 

AGATTCGTAA ACGTGGATTT AAGAATAAAG CCTTTCGAAT TTTGGAAGAT GTCATGAATC 4020 

AACTCCAAGA TGTCATACAA GGATTGGAGA AGGAGGTGAT AAAGTCCATC GTTAATCGGA 4080 
GATGGACTAG AATGCTTTTT GAAAGCAGAT GAGTATTATA TGCAATTTCT TTATATAAAA - 4140 

AGACCGGATT GCTCCGATCT TTCAATAGTT CATATTCTCA ATTTCTATTT TAAAAATAGC 4200 

TAAGGTTAAC GTCAAATGAC TACGCGACCT ATTTCATACG ATAAAAATCA AGCACTAGAC 4260 

CAGCAGGTCC TTGAACTAAT AAGGACTCTG TTCCCCAATC GGTTACAGTT GGTCCGTGTA 4320 

AAACCTTTAT ACCAAGCTCG TTCAACCGTT TGTAGTTCTG GTCTACATCC TCAACCTCGA 4380 

TATGAATAAT GATTCCTGAC TGAAAGTTTT CCAAAGGAAC CAAATGATTT TGTGACAACA 4440 

TAAGGCAGTG ACTACCAATC GTAAACTGAG CAAAACCATC ATTAGCATAA TCTGCCTTTT 4500 

TATCCAAGAT ATGCTCCAAG TCAGCACAGA CTTGGGGAAC ATTTGAAACG ATAATATCTA 4560 

ATTGATTTAA ATTCATTTAC TCTCCTCCAT AAAAAGACCG GATTGCTCCG ATCTTTTAAA 4620 

GTTCTGCTCT ATGAAAATCA AAGAATAAAG TCTACAAGTT TCATATTTGA TTTTCGGCGA 4680 

GAGGAATTAT TTAATTGCGC GTGATTGCAA TCCTTCTTCT TCCAAGAAGA GACGGAATGG 4740 

TACGAGTTCT TCTGCTTCGT ATTTTTCCTT GAAGGCTTTG ATAGCTTCTT CTGAGTGAAG 4800 

TTTTGGATCC AATTCAAGTA CTTCTACTGG AAGTGGACGG TGTTGAGTGA TGCGAGCATC 4860 

GATGACAACA GTTTTACCTT CTTTGTTCAA TTTAACAGCT TCTGCAACAA CTGCATCGAT 4920 

GTCTTCGATA CGGTCAACTG TGAATCCAAC AGCTCCTTGA GCTTCCGCAA TTTTAGCGTA 4980 

GTCAGCGTTT GTGAAGTCTA CACCAAACAA GTGTTTGTTT GTATCTTCGT ATTTGTTCTT 5040 

GATGAAGCCG TACTCAGCAT TTGAGAAGAC AAGGTTGATA ACTGGAAGGT CGTATTGAAC 5100 
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GTTTGTGATA ACGTCTGGGT AGCACATGTT GAATGCTCCG TCACCCATGA TGTTCCATAC 5160 

TTGGCGATCT GGATTGTCTT TCTTAGCAGC GATACCACCA GGAAGGGCAA TACCCATTGT 5220 

CGCAAAGAGT GGAGATGTAC GCCACATGTT CTTAGGTGTC ATGTGAAGGT GACGAGTAGA 5280 

TGTTTGAGTA GTGTTACCTA CGTCGATTGA GTAGATAGCG TCTTGATCAG CATGTTTGTT 5340 

GATTGCATTG TAAACTTGAT ACAATTGCAA TTCACCCTCA GTTTTACCTT CGAGTTTGTT 5400 

CATGTAATCA CGCCAGTTTT GGTTGTTCTT AACGTTTGCA CGCCACCATG GAGTTGATTC 5460 

AACTGGGTTT ACTTTGTCAA GGATAGCTTT AGCTGCTTGA CCAGCATCAC CAAGGATTGA 5520 

AGCGTCAAGG GCATGACGTT TACCAAGTTT GTAAGGGTCG ATATOGACTT GGATGAATTT 5580 

TTCAGTGTTC TTGAATGCTT CGTAAACTTC AGCAAATGGG AAGTTTGAAC CAAGGAAAAG 5640 

AACTGTGTCT GCTTCAAAGA CCACTTCGTT GGCTGGTTTC CAACCAACAC GGTAAGCAGA 5700 

ACCTGTCAAA CCTTCATAGT TCCATTCGAA AGCTTCAAAG TTTTTACCAG TTGTGATGAT 5760 

TGGTGCTTTG ATTTTACGTG ACAATTCAGT AATCACTTCA CCAGCTTTAA CACCACCAAA 5820 

TCCAGCATAG ATAACTGGGC GTTCAGCATT GTTCAAGATT TCAACAGCTT TGTCGATTTC 5880 

AACTTCGTTC AAAGCAGGAG CGATGAATGA GCGTTCGTAT GAACCTGAAC CGTAGTATGA 5940 

GTTTTCATCG ATTTCTTGGA AACCGAAGTT TACTGGAATT TCAACAACAG CTGGACCTTT 6000 

TTTAGAAACT GCAGCACGGC AGGCTTCGTC AATTACTTTT GGCAATTGCT CAGCGTAAGC 6060 

TACACGTTTG TTGTAAACAG CGATACCGTT GTACATTGGG TTTTGGTTAA GCTCTTGGAA 6120 

AGCATCCATG TTCAATTCGT TAACTGGACG TGATCCAAGG ATCGCTAGGA ATGGAGTGTT 6180 

ATCCATAGCT GCATCGTAAA CACCGTTAAT CAAGTGAGTC GCACCTGGAC CACCTGAACC 6240 

AACTGCAACC CCGATTGAGC CGCCGAATTT AGCTTGCATA ACCGCTGCAA GAGCACCTGT 6300 

CTCTTCGTGG CGAACTTGTA AGAAACGGAT ATCTTTGTCT TCAGCCAAAG CGTCCATCAA 6360 

TGAGCTGAGT GTTCCTGATG GGATACCGTA GATTGTATCT ACGCCCCATG TTTTCAATAC 6420 

GTTAAGCATT GCTGCAGATG CAGTAATTTT CCCTTGAGTC ATAATGATAA CTCTCCTTCA 6480 

ATTTTTTTAA ACTTGGAGAA TACGATTACA TAGAATTGGA AACGTTCTCC AAATTTTTAC 6540 

TATTCCACTG TATCATATTT ATGCTGACTT TTCTAAAAAT CTGCTCAAAA CTCTCTATTC 6600 

TCTATTCTAA TACAGTTTTG AAAGTTCTGT CATTTCTGTT TTATAACAAA GAAATCTAGT 6660 

CATTACTTTT AGTCTATTTT ACTAAAATTT AACAGAAGGG AACTGGTCAG AACAGATACA 6720 

GAACTAAAGG CCATGGCTAG ACCTGCCAAT TCTGGGTTGA GAGCCAGTCC AACACCTGAA 6780 

AAGACTCCTG CTGCAATCGG AATTCCGACA ACATTGTAGA TAAAAGCCCA GAAAAGATTG 6840 

AGTAGAATTC GATGAAAGGT TTTCTTACTC ATATCAAAGG CACGAACCAC TCCTAAAAGA 6900 
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TTATTGGTTG TCAACACCAA ATCTGCTGAC 
GCAATCCCCA CATCTGCTAC ACTAAGGGCA 
GCTACTTTCC CTGACTGTTG CAGTTTATGG 
CCTGCAATGA CCTCTTCAAT TCCGATTTGA 
TCTCCTGTCA GCATGACTGT TCGGAGACCA 
GCATTTTCCT TAGGAATATC TTGCAAAGCA 
AAGAACACAA CTGTCTTAGC TTCTTTTTCT 
GAAATATCCA TGCCATCCAG CATTTTAGCA 
CGCCCTGAAA CACCTTTCCC GTGCAAGGAC 
CCAGCTTCAC TCGCTCGCTT AACGATAGCC 
AAGGAGGCTG CCAACCCAAA CACTTCTACT 
TTCCCTTCCG TCAAAGTCCC GGTCTTATCA 
TGTAAGACAG TTCCATTTTT GAGGAGAACC 
ATAAGGGCTG TCGGTGTTGC AAGTCCCAAG 
ACTCCGTAGA GAAGAGAGGA CACAAAGCTA 
AAGACGAACC AAACCCAAAA GGTCATGATT 
CCTGAAATCT TATCCGTCAA GTCCTGAATC 
AAATCCACAA TCTGAGCCAA AACAGTCTCT 
GTTCCACTAT GATTGATGGT TGAGCCAATG 
AGACTCTCAC CTGTCACCAT GGATTCGTCA 
TCAACAGCAA TCTTTTCACC GGGACGCACT 
AAAGGAACTT GGACATAACT ATCATCACTC 
AGTAATTTCT CCACAGCTTG GGACGTATTT 
AAAAGAACGA AAAAGAGGAT AAATCCAGCA 
AGAGCAACTA GGCTATAGAA ATAAGCCACT 
TTGGCATTGT GCTTTTTAAA ACTGGCCCAA 
AACATAATAG GCGTTGTTGC TAGAAAGGTT 
CCTGTCAACA TCCCAATCAT GAGAATCACA 
AAACGTTGCA GGAGAGATAG AGATTTTCGA 



435 

TCGATGGCGA TATCTGTTCC 
GGAGCGTCAT TGATACCGTC 
ATTTCATGGG CTTTTTCTTC 
TCTGCAATAG CACGCGCCAC 
CGTTTTTTTA GCTGACTGAT 
AGCAAGCCTT TGATTTCATT 
AGTTCTTCTA GTTTATCTTG 
TTTCCAAGTA AAACTTGTTT 
TGAAAATTTT CAACAGTTTG 
TCAGCCAGTG GGTGTTGAGA 
TCGTCGCCGA TGACATCTGT 
AAGACAAGGG TTTGAACTTT 
CCCATCTTGG CACTACGTCC 
GCACAAGGAC AGGCGATAAT 
GCTCCAAGCA CAACCACACT 
CCTAAAATGA CAACTACTGG 
GGCGCACGAC TTGTCTGAGC 
GAGCCAACTT TTTCTGCTCT 
ACAGTATCTC CAACTGTCTT 
ATACTAGAGA CACCTTCTAC 
CGAATCAGGT CGCCTACCTT 
AAGACTTCTG CGGTTTTAGC 
TTTCTCATTT TTTCCTCAAA 
CTTTCGAAGT AAACAGGGAG 
AGAGTTCCCA GCGCAACCAA 
GCACTCTGGA TATATGGCTT 
CCCCAATGCA TGACTTGATG 
AGAGGCACAG TAAAGATACT 
GTCTTCTCAA CGACTGTATA 



AGCTCCCATA 6960 

CCCAACAAAG 7020 

TGGCAAGACG 7080 

ACCAGCATTG 7140 

GGCTAGCTTA 7200 

GTCAACAGCT 7260 

ATAAGTATTA 7320 

TCCATTGATT 7380 

AAACTCAAGT 7440 

AGCATCTTCC 7500 

TACCACAGGT 7560 

CTGGATTTCC 7620 

TGTCCCCACC 7680 

CAAAACCGCC 7740 

ATCCCTGAGC 7800 

GACAAAAATC 7860 

TTTCTTCACA 7920 

AAAGACAAGC 7980 

GTCCACAGGC 8040 
TACGACACCA . 8100 

GACTTGTTCC 8160 

TTGCAAGTCC 8220 

AACTGCTCCC 8280 

ACCAGCAAAG 8340 

GGTATCCATG 8400 

ACCTGCAACT 8460 

ACTAATGCTA 8520 

AGTAATCCAA 8580 

GCTTCCCTTT 8640 
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TGCATCTTCA TGCCACAAGA AAATTCATGT CGCCCTAATT CTTGAGGCGT AAAACGAATG 8700 

ACTTTCTCCT CATCTACGCC GATTGGTTCC AAGATACCTT CTTCTTCAAA CAGAATTTCC 8760 

TTATAACAGT TTGAAGGAGT AGCACGATGA AAGGTAATCT CAGCTGGAAT TCCCTTTTGA 8820 

AGCTGGATAT GGGCTGGATG ATAGCCTTTT TCAGCTCGGA TACGGATTTT TTGAATGCCA 8880 

TTTTCTAAGC TTGCTTTCAC AATTTCTGTC ATAGTCTCCA CCTACTCTAC AATCATCTTG 8940 

CCGTGCATCA TGTTCATACC ACAAGCAAAG CCAAACTCTC CAGCCTGTTC AGGOGTGATT 9000 

TCCACTACAT ACTCTTCCCC CATTGGCAGG TTCGCATGTA CACCAAAATC TGGAAAAACA 9060 

ATTTGATCCA GACATGGTGA AGGATCCTTG CGGTCAAAGA CAATGCGTGC TGGCACTGAT 9120 

TTCTTGAGGA CAATCAACTC AGGAGTATAG CCTCCCATGA CTTCCACTCG AATCTCTTGG 9180 

TATCCGTTTT TTTGCTGGGC TTTTTGTCCA GATTTTTCAG GCTTTTTGAA AAACCAAAAC 9240 

AAGATAAACG CGATAAGGGC AATACAAATA ATGGTTACAA TACTATTTAA CATGACGTCT 9300 

CCTTTACATA CAATTACATC TTACTTCTGT TACAGCACTT GATTTCTTCT CTGAAATCAC 9360 

AGCTTCCAAG TCTTCCAAGT CAGTCTGAGT AAATTCACAT TCTACAATCA AGTCAGCCAA 9420 

CAAATTCCTA ATCCTACGGG AACAAACCTT GTCTTTGATA TCTTGGACAA GTAAATCCCG 9480 

ACTTTGGTCT AGAGTTAAAA GGGCTGAATA AACAAAGGAC TTGCCTTCTT TTTTCCGAGT 9540 

CAAACACTCT TTATCAACCA GACGAGCCAA AAGTGTCTGA ACCGTGGACT TGGACCAGTC 9600 

AAACCGCTCT GCCAAAACCC TAATCAAATC TGTACTGGTC TGCTCCCCCT GCATCCAAAT 9660 

AATCTTCATG ACCTGCCATT CTGCATCTGA AATCTGCATT ACCATACCTC CAAAATCTAC 9720 

ATTTGTCAAT TACACTCATC AGTATACTCT TAAAATCTAC ATTTGTCAAT TATAGAAATA 9780 

ATATTTTCTT CGAAAAATAG AATTTTAATC ATTTGAAAAA CGATTTGCAG TCAAATATTA 9840 

CTATATAAAC AATAAAAATA TGCTATACTA AAGAAAAAAG AAAACAACCA CTAGGGGTGC 9900 

GTAAAGCTGA GATTAACGAC TGTTAGATCC CTCTGACTCA ATCTAGGTAA TGCTAGCTGA 9960 

TGGAAGTGGA AATGATAATG GGGACTAGCA GTCTTCTATT GCCTTTCTAA AACAGACTAG 10020 

CTTGTTCTTA AGAATACAAA CTTCAGTTGG TTGGGAGGTT TTAGATGACT TATTTACCCG J 0080 

TTGCTTTGAC CATTGCAGGG ACTGACCCTA GTGGTGGTGC TGGCATTATG GCAGATTTAA 10140 

AGTCATTCCA AGCGAGAGAT GTCTATGGAA TGGCTGTTGT AACCAGTCTT GTCGCTCAAA 10200 

ATACCAGAGG TGTTCAGCTA ATCGAGCACG TTTCTCCTCA AATGTTGAAA GCCCAATTGG 10260 

AGAGTGTCTT TTCTGATATT CCACCTCAGG CTGTAAAAAC TGGAATGTTG GCTACTACTG 10320 

AAATCATGGA AATCATCCAA CCCTATCTTA AAAAACTGGA TTGTCCCTAT GTCCTTGATC 10380 

CTGTTATGGT TGCTACAAGT GGAGATGCCT TGATTGACTC AAATGCTAGA GACTATCTCA 10440 
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AAACAAACTT 
TTGTTGGTTT 
AAGAATTTGG 
ATTTCCTCTT 
ACACCCATGG 
AGAGTCTTTA 
CCCCTCAACT 
AAAAACTCTC 
CTTCAAACTA 
AGTCTTATCT 
CTCTTTGATT 
CTTGCGTTTT 
CTATGCCAGT 
CTGGAATGGC 
TTTCCTTGGT 
CCGCTTCCAG 
AGACTTTGCG 
CAACCGCAAG 
TTATTTCCTT 
GCAATTGGAC 
CCACGGAATC 
ACCTCTCCTT 
AGTACTTGGT 
ATATTGAGAT 
AAAACAGCTG 
CAGCCATTTC 
ACTAAGAAAG 
TCCGTTTCTT 
AGACTTCGAA 



ACTACCTCTA 
TTCAATCCAT 
TCCTCAGTCT 
TACCAAGAAT 
TACTGGATGT 
CCAGGCAGTT 
CGGTCATGGT 
TAGTTCCCAC 
CGTCAGCTTC 
AAAACCTCAA 
TTCATTGAGT 
TGCCTCAATA 
GCCCATAAGC 
AACCGTTTGG 
GGTGGTTGGG 
AGCTCTTTTA 
AGCTACCGAA 
ACAAACATCC 
GACTTGTTTT 
TATGGTAACC 
TTGATAGCGA 
TGATGGTATC 
AACGAAATTC 
AAGAGACTGC 
TTAAGGCTCC 
CCAGTACAGC 
ACATACCAGG 
GATCTTTAGC 
TTTCTGACAT 



GCAACTATTA 
GACCCCGAAG 
GTGGTTATCA 
GAACAATTTG 
ACCTTTGCTG 
GATAAGGCCA 
TCTGGTCCAG 
TTTAAGGGAA 
CATCTGCAGC 
GGCAGTACTT 
ATTAATTAGG 
TCTTCTGCTT 
TGATCAATAT 
CAAATTGTTT 
AAAATGGCTC 
ACCGTTTTAG 
ACTGGTAATT 
AACCGATCAT 
GCCAGTTGAT 
CCTGAACGGC 
TTGGTTACCA 
TAGCCAATTT 
TTCCAATCCC 
TAAGCAAGAA 
AACCAAGTCT 
GACCTGATTT 
ATAGGTCTGA 
ACTCGCATCG 
GTTTCCTTTA 



437 
TTACGCC AAA 
ACATGCAGCG 
AAGGCGGACA 
TCTGGGAAAG 
CAGTGATTAC 
AGGCCTTTAT 
TCAACCATAC 
TTAGAGAGTT 
CTCAAAACAC 
TGAGCAACCT 
AAAGAATGTT 
GCATCAAATC 
TCTCCGAAGT 
TCAAGGTCGA 
CTGTACCCAA 
CGGTGACACC 
CATCATCTCC 
CGATTATCAA 
AATATTGATT 
AGGCCGTCTC 
GATATAGTCT 
TCATCTCTTC 
ATTCCTTGAA 
GCTTCAAAAC 
CCTGTCCCTG 
TTCGAAACGA 
CACCAGTCTT 
ACCCCAACGC 
AGGACCGTAG 



TCTTCCTGAA 
TGCTGGTCGC 
TCTCAAAGGT 
CCCACGAATT 
TGCTGAACTA 
CACAAAAGCT 
AACTTTTAAA 
TTTATACTCT 
TGTTTTGAGC 
GCGACTAGCT 
ATGCAACTTT 
ACGTACAACA 
CAAGCCTCCA 
TATCAGAGTA 
GTAATCTGCA 
GAGGATTTTT 
GATATGCAGA 
GGGTACCTGA 
GGTTGTGAGA 
AACTTTTGCA 
AAGTGCTTCT 
TTAGGAGCGA 
CAACTATTTT 
CAGTCTTTCC 
TTATCCAGTC 
CGAGGTCCTT 
TCAAGACTTG 
CGTGGTGCTT 
GTCTATAGTC 



GCAGAAGAGA 
CTGATTTTAA 
GGTGCTAAAG 
CAAACGTGTC 
GCCAAGGGCA 
ATTCAAGATG 
GATTAAGAAA 
TCGAAAATCT 
TGACTTCGTC 
TTCTAGTTTA 
TTTAAAAAGG 
GCTACACCAG 
ATAGCAACTA 
ATGGGCGCAT 
CCTGATTTCT 
TCAGGACCCA 
CCTGCTGCAT 
TAAGCATCTG 
TTTTTTTCTC 
AGAAAGCTTT 
CTATTCATAA 
AAGCTGATTG 
CTCAGCAGCG 
TTGGCTGAGA 
TAATTCAGTA 
GGGACCTGTG 
AAGCAAATCC 
TAATCCAACA 
TAAAAGGTCT 



10500 
10560 
10620 
10680 
X0740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
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TTAACTAAGC TCTTACGAAT GGATGAAGTC GTTACGCCAA CCGCATCTAC TACCATCGGG 12240 

AGAGAAGATT GGTTTGCATA CGAAGCTGCC ATGCGGATTG CTTTTTCCTT CTCAGCTGAC 12300 

AAATGCCCCA AATTGATGAA GAGAGCCTGA CTTTGCTTAG TAAAATCAAG AACTTCACGG 12360 

GAATCATCTG CCATGACAGG TTTGCATCCC AGAGCCAAAA TCCCATTTGC CAGCATCTCA 12420 

CAAGAAATCT CATTGGTAAT GCAGTGAATG AGGGAACTAG AGCCTATAGG AAAGGGATTT 12480 

GTAAATTCCT GCATCAGTCT ATCCTTTCAC TAAAGAAATA TCCCTGCACT TTTTTAAAGA 12540 

ATTCCTGCTT GATTAAAAAT CGAAAGGCAA TAAAGGAAAT CGCTGTACCA ATCAAGGTTG 12600 

CTCCGAAAAA TCGAGGCGTG TAGATAAACC AGCTAAGCTT AGCAGCTGAT CCTGTAAAGA 12660 

GTACCATAAC AGGATAGGAA ACAATGGAAC CAATAATACC TGTTCCCAAA ATCTCTCCTA 12720 

GAGCAGAATA GTGAAATTTT CGACCGTACT TATAAAAGAG ACCTGCTAGA AGGGCTCCAA 12780 

AAGTCGCTCC TGTGAGAGCT AAAGGCGGAA TCCCTTGAGT CGTCATACGG ATAAAGGCTG 12840 

TGACTGTAGC CATAGCCAAG GCATAAACAG GTCCCATCAT GATTCCTGCT AGAATATTGA 12900 

CTACACTGGA CATCGGTGCC ATTCCCTCAA TTCGAAAGAT AGGTGTAAGG ACTACATCAA 12960 

GGGCAATCAT CATAGATAAA ATGGTTAATT TGTGAACTTG TAATTGGTGC TTTCTCATGC 13020 

TTCTATTCTT CTCCTTTTTC TAAAGACTGT AAATCGCTCT TCCATGTCTG GTGTTGGTAG 13080 

GCCATTTCCC AAAACTTGGC TTCCATATGA ACACTGATGT GGAAGGCATC TAGCATTTTT 13140 

TGCTTGTCTG TCTCGTCACT TTCTCGATAG AGCTGATTGA CCAGTGCTCC CTCCTCTCTG 13200 

ATCTGTTGCT CTAACTCATC CGTAATATAA GTTTCAATCC ATTGTTGATA GAGAGGATTT 13260 

GGTGATGGTT TAAGATTAAG TGATTTGCCT ATATCATGGT ATAACCAAGG ACAAGGAAGC 13320 

AAGCTTGCAA AAGCGATGGC TAAGTTCGGT TCTGCAAATT GCCTATAAAT ATGAGAAATG 13380 

TAATGATAAC AGGTTGGAGC GATTGGATGT TGCTCCATTT CCTGGTCGCT GATTTCCAAT 13440 

TCCTTGAAAA ATTGTTGGCG AATAAATAAC TCACCCTCCA CTAAACCCTG AGCATTTTGT 13500 

TTCAAGAGTC TTTTCATCTC TTGGTTTGAA GTCTTATCAG CCAAAAGATG ATAGATTTCT 13560 

G AG AAAGCCT TCAGATAGTA GGCATCCTGA ATCAGGTAAT AGCGGAAAAT GGCAGGTTCT 13620 

AAATTCCCCT CTTGTAATTG TAAAATAAAG GGATGATGAA AGGAAGCCTG CCAAGCTTTC 13680 

TTGGATAATT CCATCGCAAT ATCTGTAAAT TCCATAATAA CTCCTTTATA AAAATAGACT 13740 

GGTTTGAAGC AATAAAAAGA AAAGCAGGTA GATTAATTTT GTTTTTTTAG GAATATAAAA 13800 

AGTCCGATAG CTATTCTTCA ACTGTGCATG TTCGTCATAT CCGTGAGCAG ATAGAGCTCT 13860 

CAGGTAAAGA TGGCGCCACC TAAAGACTGT CATCAGAACC TTACTGTAAA TCAAGGGCGA 13920 

CCAAAAATGT AGTTCTTGAC CACGTAATAG GCAAGCTTCT TTGAGGGACT TGATTTCTTG 13980 
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CTGAATGAGA GGAAAAGAAT TGAATACCAC AATCAAGGCA TAGGACCAAG AGCGTGATAG 14040 

CCCCTTTTGA GCCAAGTACA AGAGAAGCTC TTTTAGTGAA ACAGAGGAAA CAAAGACAAG 14100 

GCCGATACAA ACTGTCACAA AGGCCCTCGT TCCAAGCATG ACTGCCTGTG AAGCATCTCC 14160 

GTGTAACTGA ACTGCCCAGT AGTTGGCAAA AGATGGTAAA ATGGCAAGTA TGATCATCCA 14220 

AGCTAACATT TTAAATCGAC GGTAATAGAG CATAAAGAGA ATACAAAATG CGACTACCGA 14280 

AAGAGTCAGA GCAATCGAAG GAATGAAAGA TGTTTCCAAG GATAAAATCA GCAAGAAGAG 14340 

ACTGATAATC GGTGTCTGGG TTGCTACTTT GACCATACTA TCTCACCTCC CCTTGGGTAT 14400 

TGCTACTCTG AGATGTAAGT GGTTTGGTAA TGGTCACTTC TTTCACATGC CGAAGACCCT 14460 

GACTAGTCAT CTCAATCCAA TAATCAACCA CAGAAATCAA AGGGTCTAAA CGATGACTAA 14520 

TGAGCAGAAA ACTTCTTCCT TGATTCCTCT CCTCCACAAT CCACTTGCAA AAATAATGGC 14580 

AGGCTCTATC ATCCAAACCT GCAAAAGGTT CATCTAGCAA GATCACGGAA GCCTTACTGG 14640 

TCAAGATGGT CAGGAGCTGA AGAATTTTTT GCTGACCACC ACTTAATTGA TAGGGACTCT 14700 

TATCGACTGC CTGCTCCAAA TCAAAATATC GTAAAGCTTG AAAAATCCGC TGATTTCTTT 14760 

CAGAATCAGG TCCATCTAAT TGAAGCTCCT CTCGCAGACT GACTCGGATA AACTGCTTCT 14820 

CAGCTTCCTG AACAACACCA GTCAGATCAC GATACAAACT CTTTTTCTTT TTCAGGACCG 14880 

AACCCTTCCA AGTAATGCTC CCCTTATACT TTTGAAATTG AAGAATAGAC CGAAAGAGGG 14940 

TTGATTTCCC GACACCATTG TCACCCAGGA TACAGGAAAT CCCTTGATAG AATGTGAAAT 15000 

CAGCAATTGA AAAGAGGGGG CGATTACCAA GCTCACCAGT CACACGGTTC ATATGGAATA 15060 

GTTCCGGGCT AGAAGCAACT TCCTTTGAAG CAACCTGTGT CATCTCATAG GAAGGGATTT 15120 

GAAACACTTC CCTTAGTTTT CCGTCTCTTA GCTCCACCAT ATGGTCGATA TAGGCTTTAT 15180 

AGTCAGATAA ATCATGGTCG CACAAAATAA CTGTCTTCCC ATCATAGACC AACTCTTTTA 15240 

GAATCTCCAA TATCTCGATT CTGCTCTTGC GGTCAATGGA AGCGAAGGGC TCATCCAAGA 15300 

GATAGACCCT AGGATTCATG GCAAAGAGGA CAGCCAGCGC TGCTTTTTGC TTTTCCCCAC 15360 

CTGATAAGTG ATGGATGAGA CGGTGCAAGA TGTCCTTGCA ACGACATTGC TGGACAACCT 15420 

CTGCTATTTT AGAATCAATT TCCTGAAGGT GATAGCCGAT ATTTTCCATG GTAAAAACCA 15480 

ACTCCTCAAA CAAGCTCTCC ATGGTAAATT GATGATTAGG ATTTTGCAAG AGAATACCAA 15540 

CCGTCTGGAC ACGTTCGACG ATAGAAAGCT GACTGACCTC GCTCCCATCT ATCAGGACTT 15600 

GACCGCTATA GGGAAGAGAA CTAACTTGGG CAATCATTTG AAAGAGGCTG GATTTTCCAG 15660 

ACCCACTACT CCCAACTAAC AAGGTAAAGG CTTGCGCATG AAAAGTAAAA TCAAACGGCT 15720 



WO 98/18931 



PCTYUS97/19588 



440 

CAGAGAAGAT TGGGGACTGA ATCGCTCGTA GTTCCAGACC CATCTATGCT TTTCCTCCAG 15780 

TTGCAAACTG ATGATAGAGT TTGACAATGG CACGAACCAA GATGGTACAG AAGAAATAAA 15840 

CAGAAATAAA ACGTACCACA AGCAAGGAAA GGACAAACGG AAGGGAAAAG GCGTAGTAAC 15900 

CTAACTTAAT GTATTCATAG ACAAAGCTAA CAAGCGTAAT CCCAATACTA TTAGCAGTTA 15960 

GAGAGAGCCA ACTTTCATAG CGATTCTTAG TTACGATAAA ACCAAATTCA CTTCCCAAAC 16020 

CTTGAACAAA GCCAGACAAA AGAGCTCCTA GACCAAATTG GCTACCATAA AGGACTTCAG 16080 

CAAGCGCAGC TAGCACTTCT CCAATCGTTG CACTTCCGAC TCTCGGAACA AAGATGGCAG 16140 

CAATGGGCGC AGCCATACAC CAGAGACCGA AGAGGATTTC ATTGGCAAAG GCCTGCAAAC 16200 

CAAGAGGTGT TAAGAGTAGA CTGAGAATAT TATACACATA TCCTGAACCA ACGAAAACCC 16260 

CACCAAAAAA GATAGACAAG AAAGCAAGCA AGATAACATC TTTTAACTGC CATTTTTTCA 16320 

ACATAAAAAA CTCCTTTTTT TAAAGAAAAG TGAGGCACTC AAGAAGACCG ACCTAAATAC 16380 

TTTGTATAGC AGACTGAATT TAGAACAGTA CACAAGAACA CTAAAATATT TCTAGAAATT 16440 

AATTTGAATT TTCTAATTGA TTTGTTCGCA TCTTATTTCA ATCTACTATA TCATCTTCAT 16500 

CCAGTTTCGT AAAAGAAAAA ACTCTAATTA CAGATACAAA TTAGAGTTCA GCTTACAAGA 16560 

TTAGACAGTT CTTTTCGACA TACGAAAAAA ACATTTCACA TTTCCCTTCG CCAGTCTTAA 16620 

CTGTATCAGG TTCAATGGGT ATCATCTCAG CCTAAAGCAC CCCAAATGTC TTTATTATTT 16680 

AATTATGTGA TTATTATAAC ACACATTTTA TACTAGTTCA AGAAATTGAA CTGGAAATAC 16740 

AGCCTTGCAC TCACAAAGAC AGCAGATCTT TCTTTTGCAA AAAACAAATG ACCTGTTTGA 16800 

TGAATTAGCC ATTCAAGCTG AATCTGGACA TAGCTTTTTA AAAAAGGAAA ATCCTACTTA 16860 

CTTAGAATCC AAGGATAGAT ATCTATTGTT CACTCATTTC CCGAACAGTT TTTTCTATAT 16920 

TTTTTGCATA CGATATTGCC GAAATGATTG AAACGCCATC CATATTGGTC TTTATAATGT 16980 

CTTTAATATG TTTCGTCTGT ATCCCACCAA TTGCAACTAA AGGCATTTGT GGCAATAGTT 17040 

TTCTCATCAA TTCAAGACCT TCATAACCTA TAGTACCACC AGCATCATCC TTTGACTGGG 17100 

TACCAAATAC AGGCCCAACA CCTACATAAT CTACATATTC AACTTTTGAT TGTTGAAATT 17160 

CTTCTTCGTT TCTTATAGAA AGACCAATTA TTTTATCTGG CATCAATTTT CTAATTTCAT 17220 

CAACACCAAT ATCATCTTGA CCTACATGTA CGCCATCGGC GTCAATTTCC ATTGCTAAAT 17280 

CTATATCGTC ATTAACGATA AATGGAACAT TGTATTTTTT ACAAAGTTCT TTAATTTGGA 17340 

TAGCTAGCTC AAGTTTTTCT AAGCCTTCTA AAGCACCCTC ACCTTTTTCT CGAAATTGAA 17400 

ATAAGGTTAT ACCACCTTTT AAGGCTTCCT CAACGACTGT ATATAGATTT TTTCCTTGGC 17460 

AAGTAGTCGT TCCACAAATA AAATATAGTT TTAGTAATTC TTTATGAAAC ATCTTACTTC 17520 



